Qwen 3.7-Max: Alibaba's Long-Horizon Agent Engine

Introduction

Alibaba Cloud has officially launched Qwen 3.7-Max, its latest flagship model specifically designed for the "agent era." Building upon the success of its predecessors, Qwen 3.7-Max shifts the focus from simple text generation and basic chat interactions to long-horizon autonomy, complex coding, and multi-step tool execution.

The highlight of the release is Qwen 3.7-Max's ability to maintain high coherence and operational productivity over extended periods without human intervention. During a landmark demonstration, the model autonomously worked for 35 hours to optimize a software project's attention mechanism kernel, executing over a thousand tool calls and achieving a 10x speedup. Alibaba claims this demonstrates how AI agents can transition from simple assistants to independent engineering partners.

The 35-Hour Autonomous Kernel Optimization

To demonstrate the real-world utility of Qwen 3.7-Max, the development team put it to the test on a complex, highly specialized coding task: optimizing the Extend Attention Kernel (part of SGLang) for a T-Head ZW-M890 PPU—a hardware architecture the model had never encountered during its training, with no prior documentation or example kernels provided.

Faced with a completely unfamiliar target architecture, the model undertook the optimization process autonomously. The run highlights include:

Continuous Autonomy: The model ran for 35 hours continuously without human guidance, showing significant resilience against logical loops or decay in reasoning capacity.
Extensive Tool Interaction: Qwen 3.7-Max made 1,158 tool calls and conducted 432 kernel evaluations.
Self-Improving Cycle: The agent worked through a continuous cycle of compiling, profiling to measure performance bottlenecks, writing code adjustments, and re-compiling until it achieved optimal results.
10.0x Performance Improvement: Ultimately, the autonomous optimization loop delivered a 10.0x geometric mean speedup compared to the reference Triton implementation.

This shows that given a single, clearly defined objective, Qwen 3.7-Max can operate as a highly effective, self-correcting engineering agent.

Generalizing Agentic Abilities Across Environments

One of the most notable claims made by the Qwen team is that agentic capabilities can generalize across diverse environments in a similar manner to how language models generalize vocabulary and syntax from diverse text corpora.

Historically, AI agents have been brittle, often failing when moved to a new framework, tool set, or API. Qwen 3.7-Max addresses this by separating training data into three distinct components:

The Task: The core problem to be solved.
The Execution Environment: The terminal, browser, or virtual machine where actions occur.
The Validator: The test suite or evaluation metrics determining success.

By training on randomized combinations of tasks, environments, and validation structures, Qwen 3.7-Max learned to generalize actions rather than memorizing a specific environment. This allows it to:

Transfer Patterns: Apply action patterns learned in one tool or scaffold to completely new tasks and tools.
Maintain Scaffold-Agnostic Performance: Perform consistently across multiple developer scaffolds and agents (e.g., Claude Code, OpenClaw, and Qwen Code).
Leverage MCP: Seamlessly integrate with the Model Context Protocol to consume external tools and data sources.

Benchmark Achievements

Qwen 3.7-Max establishes strong scores across a range of coding, engineering, and agentic benchmarks:

SWE-bench Verified: Scored 80.4, demonstrating high proficiency in resolving real-world GitHub issues.
SWE-bench Pro: Reached 60.6 on the more challenging Pro subset.
Terminal Bench 2.0-Terminus: Scored 69.7, showing robust CLI usage and system administration capabilities.
SciCode: Achieved 53.5 in solving scientific and mathematical coding problems.

Additionally, in KernelBench L3, Qwen 3.7-Max achieved a 1.98x median speedup over PyTorch reference implementations, outperforming standard torch.compile optimization in 96% of cases.

How to Access Qwen 3.7-Max

The model is now available to developers and enterprises through several interfaces:

Qwen Studio: Experience the model directly in the chat interface via Qwen Studio.
Qwen API: Integrate Qwen 3.7-Max into your applications using the Alibaba Cloud Model Studio API.

Conclusion

Qwen 3.7-Max represents a significant leap forward in agentic AI. Rather than simply functioning as a chatbot or a code autocomplete engine, it serves as a long-horizon cognitive engine capable of autonomous, goal-directed problem solving over hours or days. By demonstrating that agentic behavior can generalize across environments, Alibaba is paving the way for more robust, flexible, and capable AI workflows.

To explore how these capabilities are transforming the software development landscape, check out our glossary terms on AI Agents and the Model Context Protocol.

Qwen 3.7-Max: Alibaba's Long-Horizon Agent Engine

Introduction

The 35-Hour Autonomous Kernel Optimization

Generalizing Agentic Abilities Across Environments

Benchmark Achievements

How to Access Qwen 3.7-Max

Conclusion

Sources

Frequently Asked Questions

What is Qwen 3.7-Max?

How long did Qwen 3.7-Max operate autonomously in the kernel optimization test?

How does Qwen 3.7-Max generalize its agent capabilities?

Related Articles

Google Antigravity Triples Gemini Request Limits

Embedded Language Flows: MIT Revitalizes Text Diffusion

Qwen-Scope: Alibaba's Open 'X-Ray' for Model Interpretability

Continue Your AI Journey