Introduction
Alibaba unveiled Qwen3-Max-Thinking, a reasoning-focused variant of its flagship Qwen3-Max model that achieved perfect scores on elite mathematics competitions. The model scored 100% on both AIME 2025 and HMMT (Harvard-MIT Mathematics Tournament), matching OpenAI's top model performance on these reasoning benchmarks. Built on the trillion-parameter Qwen3-Max foundation, the thinking variant emphasizes deliberate, step-by-step solutions for complex problems in algebra, number theory, and probability.
Key achievements
Perfect scores on elite benchmarks
- AIME 2025: 100% score on the American Invitational Mathematics Examination
- HMMT: 100% score on the Harvard-MIT Mathematics Tournament
- Performance parity: Matches OpenAI's top model on reasoning tests
- Research significance: Elite mathematics contests are regarded as strong proxies for advanced reasoning capabilities
Technical capabilities
- Built on Qwen3-Max: Leverages the trillion-parameter flagship model architecture
- Step-by-step reasoning: Emphasizes deliberate, traceable solutions for complex problems
- Target domains: Optimized for algebra, number theory, and probability challenges
- Controllable latency: Allows tuning for accuracy vs. speed trade-offs
Competitive positioning
Alibaba claims Qwen3-Max-Thinking matches or exceeds performance of leading models:
- Claude Opus 4 (Anthropic)
- DeepSeek V3.1 (DeepSeek AI)
- Grok 4 (xAI)
- GPT-5 Pro (OpenAI)
Real-world validation: crypto trading trial
A live trading experiment demonstrated practical performance:
- Return: 22.3% over two weeks on a $10,000 crypto portfolio
- Outperformance: Significantly exceeded DeepSeek (4.9%) and several US models that booked losses
- Implication: Demonstrates reasoning capabilities beyond academic benchmarks
Availability and access
- Qwen web chatbot: Available for testing via the official Qwen interface
- Alibaba Cloud APIs: Enterprise access through Alibaba Cloud's platform
- Early access: Supports tool use and stepwise reasoning on technical tasks
- Use cases: Finance, research, and operations requiring reliability and auditability
Why it matters
Perfect scores on elite mathematics competitions represent a significant milestone in AI reasoning capabilities. These benchmarks are particularly valuable because they require sophisticated problem-solving, multi-step reasoning, and mathematical intuition—qualities essential for real-world applications in research, finance, and complex decision-making. The crypto trading trial results further validate the model's practical utility beyond academic benchmarks.
The emphasis on step-by-step, traceable solutions addresses a critical need for enterprise applications where auditability and reliability are paramount. This positions Qwen3-Max-Thinking as a strong candidate for applications requiring both high accuracy and explainable reasoning processes.
Future developments
Alibaba researchers plan further improvements:
- Broader task coverage: Expanding capabilities without diluting peak mathematics performance
- Multilingual reasoning: Extending reasoning capabilities across languages
- Safety alignment: Enhanced safety measures for production deployment
- Robustness: Improved performance under distribution shift
- Community tracking: Benchmarks and contests to monitor progress
Conclusion
Qwen3-Max-Thinking represents a significant achievement in AI reasoning, demonstrating that models can achieve perfect scores on elite mathematics competitions while maintaining practical utility in real-world applications. The combination of academic excellence (AIME 2025, HMMT) and practical validation (crypto trading trial) suggests the model has strong potential for enterprise applications requiring both high accuracy and traceable reasoning.
As reasoning capabilities continue to improve across the AI landscape, models like Qwen3-Max-Thinking are setting new benchmarks for what's possible in mathematical problem-solving and complex decision-making. For developers and researchers working on reasoning-intensive applications, this represents an important step forward in accessible, high-performance AI reasoning tools.
Explore more about Qwen models and reasoning capabilities in our Models catalog.