Qwen 3.7-Max: Alibaba's Long-Horizon Agent Engine

Alibaba launches Qwen 3.7-Max, a flagship AI model demonstrating 35-hour autonomous operation, 10x kernel speedup, and cross-agent generalization.

by HowAIWorks Team
AlibabaQwen 3.7-MaxAI ModelsAgentic AISoftware EngineeringAI ResearchBenchmarksModel Context Protocol

Introduction

Alibaba Cloud has officially launched Qwen 3.7-Max, its latest flagship model specifically designed for the "agent era." Building upon the success of its predecessors, Qwen 3.7-Max shifts the focus from simple text generation and basic chat interactions to long-horizon autonomy, complex coding, and multi-step tool execution.

The highlight of the release is Qwen 3.7-Max's ability to maintain high coherence and operational productivity over extended periods without human intervention. During a landmark demonstration, the model autonomously worked for 35 hours to optimize a software project's attention mechanism kernel, executing over a thousand tool calls and achieving a 10x speedup. Alibaba claims this demonstrates how AI agents can transition from simple assistants to independent engineering partners.

The 35-Hour Autonomous Kernel Optimization

To demonstrate the real-world utility of Qwen 3.7-Max, the development team put it to the test on a complex, highly specialized coding task: optimizing the Extend Attention Kernel (part of SGLang) for a T-Head ZW-M890 PPU—a hardware architecture the model had never encountered during its training, with no prior documentation or example kernels provided.

Faced with a completely unfamiliar target architecture, the model undertook the optimization process autonomously. The run highlights include:

  • Continuous Autonomy: The model ran for 35 hours continuously without human guidance, showing significant resilience against logical loops or decay in reasoning capacity.
  • Extensive Tool Interaction: Qwen 3.7-Max made 1,158 tool calls and conducted 432 kernel evaluations.
  • Self-Improving Cycle: The agent worked through a continuous cycle of compiling, profiling to measure performance bottlenecks, writing code adjustments, and re-compiling until it achieved optimal results.
  • 10.0x Performance Improvement: Ultimately, the autonomous optimization loop delivered a 10.0x geometric mean speedup compared to the reference Triton implementation.

This shows that given a single, clearly defined objective, Qwen 3.7-Max can operate as a highly effective, self-correcting engineering agent.

Generalizing Agentic Abilities Across Environments

One of the most notable claims made by the Qwen team is that agentic capabilities can generalize across diverse environments in a similar manner to how language models generalize vocabulary and syntax from diverse text corpora.

Historically, AI agents have been brittle, often failing when moved to a new framework, tool set, or API. Qwen 3.7-Max addresses this by separating training data into three distinct components:

  1. The Task: The core problem to be solved.
  2. The Execution Environment: The terminal, browser, or virtual machine where actions occur.
  3. The Validator: The test suite or evaluation metrics determining success.

By training on randomized combinations of tasks, environments, and validation structures, Qwen 3.7-Max learned to generalize actions rather than memorizing a specific environment. This allows it to:

  • Transfer Patterns: Apply action patterns learned in one tool or scaffold to completely new tasks and tools.
  • Maintain Scaffold-Agnostic Performance: Perform consistently across multiple developer scaffolds and agents (e.g., Claude Code, OpenClaw, and Qwen Code).
  • Leverage MCP: Seamlessly integrate with the Model Context Protocol to consume external tools and data sources.

Benchmark Achievements

Qwen 3.7-Max establishes strong scores across a range of coding, engineering, and agentic benchmarks:

  • SWE-bench Verified: Scored 80.4, demonstrating high proficiency in resolving real-world GitHub issues.
  • SWE-bench Pro: Reached 60.6 on the more challenging Pro subset.
  • Terminal Bench 2.0-Terminus: Scored 69.7, showing robust CLI usage and system administration capabilities.
  • SciCode: Achieved 53.5 in solving scientific and mathematical coding problems.

Additionally, in KernelBench L3, Qwen 3.7-Max achieved a 1.98x median speedup over PyTorch reference implementations, outperforming standard torch.compile optimization in 96% of cases.

How to Access Qwen 3.7-Max

The model is now available to developers and enterprises through several interfaces:

Conclusion

Qwen 3.7-Max represents a significant leap forward in agentic AI. Rather than simply functioning as a chatbot or a code autocomplete engine, it serves as a long-horizon cognitive engine capable of autonomous, goal-directed problem solving over hours or days. By demonstrating that agentic behavior can generalize across environments, Alibaba is paving the way for more robust, flexible, and capable AI workflows.

To explore how these capabilities are transforming the software development landscape, check out our glossary terms on AI Agents and the Model Context Protocol.

Sources

Frequently Asked Questions

Qwen 3.7-Max is Alibaba's flagship AI model optimized for autonomous software engineering, tool use, and long-horizon agentic workflows.
The model ran autonomously for 35 hours, performing 1,158 tool calls to achieve a 10.0x speedup in attention kernel performance.
It generalizes agentic abilities from diverse training environments, allowing it to adapt to new tools, tasks, and scaffolds without task-specific tuning.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.