Alibaba Qwen 3.5: 1M Token Context and Efficiency

Introduction

Alibaba's Qwen team has officially unveiled the Qwen 3.5 Medium family, marking another significant leap in the efficiency and capability of open-weight models. This new series is designed to push the boundaries of what is possible with moderate-sized models, focusing heavily on reasoning, agentic performance, and massive context handling.

The lineup includes several variants tailored for different needs:

Qwen3.5-Flash
Qwen3.5-35B-A3B
Qwen3.5-122B-A10B
Qwen3.5-27B

The "Surprise" of the Series: Qwen3.5-35B-A3B

Perhaps the most impressive achievement in this release is the Qwen3.5-35B-A3B. According to official benchmarks, this model manages to outperform its massive predecessor, the Qwen3-235B-A22B-2507.

What makes this remarkable is the efficiency gain: while the older model relied on 22B active parameters, the new 35B-A3B achieves superior results with significantly fewer resources. In terms of architectural efficiency, this represents an improvement of more than 7 times, proving that smarter training and architecture can outweigh raw scale.

Tailored for Agents: Qwen3.5-Flash

For developers building AI agents and production-grade applications, Qwen3.5-Flash is the headline feature. It is a production-optimized version of the 35B-A3B model, specifically tuned for agentic scenarios.

Key features include:

1 Million Token Context Window: Out of the box, Flash can handle immense amounts of data, effectively removing the need for complex RAG (Retrieval-Augmented Generation) setups in many use cases.
Native Function Calling: It supports tool use and API interactions natively, making it a robust backbone for complex autonomous tasks.
Deep Reasoning Integration: Despite its "Flash" moniker, it retains strong logical foundations for multi-step execution.

Advanced Reasoning and Planning

The larger models in the family, Qwen3.5-122B-A10B and Qwen3.5-27B, are dedicated to the most complex tasks. These models are built for planning, long-term instruction following, and intricate chains of reasoning.

To achieve this level of performance, the Qwen team utilized a sophisticated four-stage post-training pipeline. This includes:

Cold-start via long Chain-of-Thought (CoT) data.
Reinforcement Learning (RL) based on high-quality reasoning signals.
Planning optimization for multi-step task execution.

The 122B-A10B model, with only 10B active parameters, competes directly with much heavier dense models in terms of logical coherence and performance on reasoning benchmarks.

Availability and Licensing

Alibaba continues its commitment to the open-source community by releasing the model weights for Qwen 3.5 Medium under the Apache 2.0 License.

Hugging Face: Weights are available for download and self-hosting.
Model Studio: The Qwen3.5-Flash model is available via Alibaba Cloud's Model Studio, priced competitively at approximately $0.10 per 1 million input tokens and $0.40 per 1 million output tokens.

Conclusion

The release of Qwen 3.5 Medium proves that the next frontier of LLM development isn't just about size—it's about efficiency and specialization. By delivering 122B-level reasoning with 10B active parameters and providing a 1-million-token context in a "Flash" model, Alibaba is setting a new standard for accessible, high-performance AI.

Alibaba Qwen 3.5: 1M Token Context and Efficiency

Introduction

The "Surprise" of the Series: Qwen3.5-35B-A3B

Tailored for Agents: Qwen3.5-Flash

Advanced Reasoning and Planning

Availability and Licensing

Conclusion

Sources

Frequently Asked Questions

What is the standout model in the Qwen 3.5 Medium series?

What are the capabilities of Qwen3.5-Flash?

How were the larger Qwen 3.5 models trained?

Related Articles

Qwen 3.7-Max: Alibaba's Long-Horizon Agent Engine

Embedded Language Flows: MIT Revitalizes Text Diffusion

Qwen-Scope: Alibaba's Open 'X-Ray' for Model Interpretability

Continue Your AI Journey