Holo3: H Company's SOTA Foundation Model for Desktop Agents

H Company unveils Holo3, a high-performance Mixture-of-Experts model family that sets a new industry standard for autonomous desktop application control.

by HowAIWorks Team
aih-companyholo3agentsdesktop-automationmoegui-agentsosworldautomation

Introduction

The race for truly autonomous AI agents has reached a new milestone with the release of Holo3, a state-of-the-art family of multimodal models designed by the Paris-based startup H Company (formerly Holistic AI). Built specifically for the complex task of controlling graphical user interfaces (GUIs), Holo3 represents a significant leap forward in bridge-to-action technology, allowing AI to navigate desktop environments as a human would.

H Company, led by Stanford researcher Charles Kantor and Google DeepMind veteran Laurent Sifre (a key contributor to the AlphaGo project), has quickly become a powerhouse in the European AI scene. With a massive $220 million seed round backed by investors like Eric Schmidt, Amazon, Samsung, and UiPath, the company is positioning Holo3 as the foundational infrastructure for the next generation of business automation.

Architecture: The Mixture-of-Experts Advantage

Holo3 is built on a Mixture-of-Experts (MoE) architecture, which allows the model to handle diverse and specialized tasks without the computational overhead of traditional dense models. The family consists of two primary versions:

  • Holo3-122B-A10B (The Flagship): This high-capacity model is the current SOTA leader. It is available exclusively through H Company's platform, priced at $0.40 per million input tokens and $3.00 per million output tokens.
  • Holo3-35B-A3B (The Open Version): Released under the Apache 2.0 license, this model is available on Hugging Face. It offers a more accessible entry point for developers, with a free Inference API tier (limited to 10 PRM) and a paid tier at $0.25/$1.80 per million tokens.

Three-Stage Training Loop

What sets Holo3 apart is its rigorous training methodology. H Company utilized a closed-loop system consisting of three distinct phases to ensure the model can handle real-world unpredictability:

  1. Synthetic Navigation Generation: The model starts by learning from high-quality, synthetic examples of interface navigation based on predefined scenarios.
  2. Expansion Beyond Initial Conditions: To prevent overfitting and prepare the model for "edge cases," the synthetic data is expanded to include non-standard situations and unexpected UI behaviors.
  3. Curated Selection and RL: Finally, all training examples undergo a curated selection process and are refined through Reinforcement Learning (RL) to optimize for success rates and safety.

To support this massive undertaking, H Company developed a proprietary "Corporate Environment Generator," where agents build entire web applications from scratch, creating a playground of verifiable tasks for the model to master.

Performance: Breaking Benchmarks

The flagship Holo3-122B-A10B model has already claimed the top spot on the OSWorld-Verified benchmark, scoring a record-breaking 78.85%. This benchmark is widely considered the most rigorous test for desktop interaction, requiring the model to solve multi-step problems across various operating systems and applications.

H Company also introduced the H Corporate Benchmarks, featuring 486 multi-step tasks across four critical business categories:

  • E-commerce: Handling complex purchasing and logistics workflows.
  • Business Software: Interacting with ERPs and CRMs.
  • Collaboration Tools: Coordinating across project management suites.
  • Cross-App Scenarios: The "Holy Grail" of agents, requiring synchronization between multiple systems simultaneously (e.g., extracting data from a PDF, matching it against a budget in Excel, and sending automated emails in Outlook).

Conclusion

Holo3 isn't just an improvement over existing models; it is a specialized tool built for a future where AI handles the "busy work" of desktop computing. By combining a powerhouse MoE architecture with a sophisticated synthetic training environment, H Company has moved us closer to a world where agents can seamlessly navigate the software we use every day.

Sources


Ready to explore the future of agentic AI? Check out our AI Engineering courses, or dive into our glossary to learn more about Mixture-of-Experts and GUI agents.

Frequently Asked Questions

Holo3 is a family of multimodal models from H Company designed specifically to navigate and control graphical user interfaces (GUIs) across desktop environments.
Yes. The Holo3-35B-A3B model is open-sourced under the Apache 2.0 license on Hugging Face, while the flagship 122B version is accessible via H Company's platform.
Holo3-122B-A10B currently ranks #1 on the OSWorld-Verified benchmark with a score of 78.85%, making it the leading model for desktop-based tasks.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.