Qwen3-Max-Thinking: Perfect Reasoning Scores

Introduction

Alibaba unveiled Qwen3-Max-Thinking, a reasoning-focused variant of its flagship Qwen3-Max model that achieved perfect scores on elite mathematics competitions. The model scored 100% on both AIME 2025 and HMMT (Harvard-MIT Mathematics Tournament), matching OpenAI's top model performance on these reasoning benchmarks. Built on the trillion-parameter Qwen3-Max foundation, the thinking variant emphasizes deliberate, step-by-step solutions for complex problems in algebra, number theory, and probability.

Key achievements

Perfect scores on elite benchmarks

AIME 2025: 100% score on the American Invitational Mathematics Examination
HMMT: 100% score on the Harvard-MIT Mathematics Tournament
Performance parity: Matches OpenAI's top model on reasoning tests
Research significance: Elite mathematics contests are regarded as strong proxies for advanced reasoning capabilities

Technical capabilities

Built on Qwen3-Max: Leverages the trillion-parameter flagship model architecture
Step-by-step reasoning: Emphasizes deliberate, traceable solutions for complex problems
Target domains: Optimized for algebra, number theory, and probability challenges
Controllable latency: Allows tuning for accuracy vs. speed trade-offs

Competitive positioning

Alibaba claims Qwen3-Max-Thinking matches or exceeds performance of leading models:

Claude Opus 4 (Anthropic)
DeepSeek V3.1 (DeepSeek AI)
Grok 4 (xAI)
GPT-5 Pro (OpenAI)

Real-world validation: crypto trading trial

A live trading experiment demonstrated practical performance:

Return: 22.3% over two weeks on a $10,000 crypto portfolio
Outperformance: Significantly exceeded DeepSeek (4.9%) and several US models that booked losses
Implication: Demonstrates reasoning capabilities beyond academic benchmarks

Availability and access

Qwen web chatbot: Available for testing via the official Qwen interface
Alibaba Cloud APIs: Enterprise access through Alibaba Cloud's platform
Early access: Supports tool use and stepwise reasoning on technical tasks
Use cases: Finance, research, and operations requiring reliability and auditability

Why it matters

Perfect scores on elite mathematics competitions represent a significant milestone in AI reasoning capabilities. These benchmarks are particularly valuable because they require sophisticated problem-solving, multi-step reasoning, and mathematical intuition—qualities essential for real-world applications in research, finance, and complex decision-making. The crypto trading trial results further validate the model's practical utility beyond academic benchmarks.

The emphasis on step-by-step, traceable solutions addresses a critical need for enterprise applications where auditability and reliability are paramount. This positions Qwen3-Max-Thinking as a strong candidate for applications requiring both high accuracy and explainable reasoning processes.

Future developments

Alibaba researchers plan further improvements:

Broader task coverage: Expanding capabilities without diluting peak mathematics performance
Multilingual reasoning: Extending reasoning capabilities across languages
Safety alignment: Enhanced safety measures for production deployment
Robustness: Improved performance under distribution shift
Community tracking: Benchmarks and contests to monitor progress

Conclusion

Qwen3-Max-Thinking represents a significant achievement in AI reasoning, demonstrating that models can achieve perfect scores on elite mathematics competitions while maintaining practical utility in real-world applications. The combination of academic excellence (AIME 2025, HMMT) and practical validation (crypto trading trial) suggests the model has strong potential for enterprise applications requiring both high accuracy and traceable reasoning.

As reasoning capabilities continue to improve across the AI landscape, models like Qwen3-Max-Thinking are setting new benchmarks for what's possible in mathematical problem-solving and complex decision-making. For developers and researchers working on reasoning-intensive applications, this represents an important step forward in accessible, high-performance AI reasoning tools.

Explore more about Qwen models and reasoning capabilities in our Models catalog.

Sources

Qwen3-Max-Thinking hits perfect scores as Alibaba raises the bar on AI reasoning — Digital Watch