Introduction
In a surprising turn of events in the AI landscape, Zoom has announced a major breakthrough in artificial intelligence performance. On December 10, 2025, Zoom revealed that its AI system achieved a score of 48.1% on "Humanity's Last Exam" (HLE), setting a new state-of-the-art (SOTA) benchmark.
This achievement places Zoom ahead of previous leaders, including Google's Gemini 3 Pro, which held the top spot with 45.8%. As AI models continue to evolve, HLE has emerged as the gold standard for measuring true expert-level reasoning capabilities, making Zoom's performance particularly significant for the industry.
What is Humanity's Last Exam (HLE)?
Humanity's Last Exam is widely regarded as one of the most difficult and comprehensive tests for artificial intelligence. Unlike standard benchmarks that often rely on rote memorization or simple pattern recognition, HLE is designed to evaluate deep understanding and multi-step reasoning.
Key characteristics of the HLE benchmark include:
- Graduate-Level Difficulty: It consists of approximately 3,000 questions that require expert-level knowledge.
- Broad Scope: The questions span over 100 distinct academic disciplines.
- Reasoning First: The questions are crafted to be non-searchable, forcing models to derive answers through logic and synthesis rather than retrieving pre-existing text from the internet.
Historically, human experts score around 90% on this exam, while leading AI models have struggled to crack the 40% barrier until recently.
The Winning Strategy: Federated AI
Zoom's success on the HLE wasn't achieved by simply training a larger model. Instead, the company employed a Federated AI approach. This strategy moves away from the "one model to rule them all" philosophy and instead leverages a collaborative system of multiple models.
Explore-Verify-Federate
At the core of this approach is an agentic workflow described as "explore-verify-federate".
- Explore: The system explores multiple reasoning paths to approach a problem.
- Verify: It rigorously checks these paths against known constraints and logic.
- Federate: The results are synthesized to produce a final, high-confidence answer.
This method allows Zoom's AI to tackle the complex, multi-layered problems found in HLE by effectively "thinking" through them in a way that single-shot inference often fails to do.
Benchmark Results
The new leaderboard standings highlight the rapid competitiveness of the field. Zoom's 48.1% score is a notable leap over the competition.
- Zoom AI: 48.1%
- Google Gemini 3 Pro (with tools): 45.8%
- GPT-5.2 / Claude Opus 4.5: ~30-40% range
While other emerging models like xAI's Grok-4 Heavy are also showing impressive results on separate leaderboards, Zoom's performance on HLE underscores the effectiveness of agentic workflows and federated architectures over raw model size alone.
Real-World Impact: Solving Tomorrow's Challenges Today
For the average Zoom user, this might seem like abstract academic progress, but the implications are practical and immediate. The same underlying technology that powers Zoom's success on HLE is being integrated into Zoom AI Companion.
Updates to the platform include:
- More Accurate Summaries: Meeting notes will capture nuance and context with greater precision.
- Action Item Extraction: The AI can better identify complex tasks and assign ownership.
- Complex Workflow Automation: Agentic capabilities allow for handling multi-step business processes that require reasoning, not just execution.
- Cross-Platform Retrieval: Enhanced ability to synthesize information from various data sources (chats, emails, docs).
Conclusion
Zoom's record-breaking performance on Humanity's Last Exam serves as a reminder that innovation in AI is not solely the domain of foundational model labs like OpenAI or Google DeepMind. By applying a federated, agentic approach to existing strong models, Zoom has demonstrated that how you use AI is just as important as the model itself.
As we move into 2026, we can expect this trend of specialized, reasoning-focused architectures to dominate the next wave of AI development, bringing us closer to bridging the gap between AI and human expert performance.