Introduction
The race for truly autonomous AI agents has reached a new milestone with the release of Holo3, a state-of-the-art family of multimodal models designed by the Paris-based startup H Company (formerly Holistic AI). Built specifically for the complex task of controlling graphical user interfaces (GUIs), Holo3 represents a significant leap forward in bridge-to-action technology, allowing AI to navigate desktop environments as a human would.
H Company, led by Stanford researcher Charles Kantor and Google DeepMind veteran Laurent Sifre (a key contributor to the AlphaGo project), has quickly become a powerhouse in the European AI scene. With a massive $220 million seed round backed by investors like Eric Schmidt, Amazon, Samsung, and UiPath, the company is positioning Holo3 as the foundational infrastructure for the next generation of business automation.
Architecture: The Mixture-of-Experts Advantage
Holo3 is built on a Mixture-of-Experts (MoE) architecture, which allows the model to handle diverse and specialized tasks without the computational overhead of traditional dense models. The family consists of two primary versions:
- Holo3-122B-A10B (The Flagship): This high-capacity model is the current SOTA leader. It is available exclusively through H Company's platform, priced at $0.40 per million input tokens and $3.00 per million output tokens.
- Holo3-35B-A3B (The Open Version): Released under the Apache 2.0 license, this model is available on Hugging Face. It offers a more accessible entry point for developers, with a free Inference API tier (limited to 10 PRM) and a paid tier at $0.25/$1.80 per million tokens.
Three-Stage Training Loop
What sets Holo3 apart is its rigorous training methodology. H Company utilized a closed-loop system consisting of three distinct phases to ensure the model can handle real-world unpredictability:
- Synthetic Navigation Generation: The model starts by learning from high-quality, synthetic examples of interface navigation based on predefined scenarios.
- Expansion Beyond Initial Conditions: To prevent overfitting and prepare the model for "edge cases," the synthetic data is expanded to include non-standard situations and unexpected UI behaviors.
- Curated Selection and RL: Finally, all training examples undergo a curated selection process and are refined through Reinforcement Learning (RL) to optimize for success rates and safety.
To support this massive undertaking, H Company developed a proprietary "Corporate Environment Generator," where agents build entire web applications from scratch, creating a playground of verifiable tasks for the model to master.
Performance: Breaking Benchmarks
The flagship Holo3-122B-A10B model has already claimed the top spot on the OSWorld-Verified benchmark, scoring a record-breaking 78.85%. This benchmark is widely considered the most rigorous test for desktop interaction, requiring the model to solve multi-step problems across various operating systems and applications.
H Company also introduced the H Corporate Benchmarks, featuring 486 multi-step tasks across four critical business categories:
- E-commerce: Handling complex purchasing and logistics workflows.
- Business Software: Interacting with ERPs and CRMs.
- Collaboration Tools: Coordinating across project management suites.
- Cross-App Scenarios: The "Holy Grail" of agents, requiring synchronization between multiple systems simultaneously (e.g., extracting data from a PDF, matching it against a budget in Excel, and sending automated emails in Outlook).
Conclusion
Holo3 isn't just an improvement over existing models; it is a specialized tool built for a future where AI handles the "busy work" of desktop computing. By combining a powerhouse MoE architecture with a sophisticated synthetic training environment, H Company has moved us closer to a world where agents can seamlessly navigate the software we use every day.
Sources
Ready to explore the future of agentic AI? Check out our AI Engineering courses, or dive into our glossary to learn more about Mixture-of-Experts and GUI agents.