Qwen 3.5: Scaling Intelligence in Compact Models

Introduction

Alibaba's Qwen team has continued its rapid release cycle with the introduction of the Qwen 3.5 compact lineup. This new release focuses on bringing flagship-level intelligence to smaller, more efficient form factors ranging from 0.8B to 9B parameters.

The hallmark of the Qwen 3.5 series is its incredible intelligence density. By leveraging architectural innovations like Gated Delta Networks and sparse Mixture-of-Experts (MoE), Alibaba has managed to create models that punch significantly above their weight class, often outperforming much larger models from previous generations.

The Compact Lineup: From Edge to Cloud

The new release provides a versatile range of models tailored for different deployment scenarios:

0.8B and 2B: Optimized for edge devices, local applications, and ultra-fast inference. These models are ideal for on-device AI where privacy and latency are critical.
4B: A "sweet spot" for lightweight multimodal agents. It offers a strong balance between footprint and reasoning capability, suitable for small-scale AI services.
9B: The high-performance tier of the compact series. Despite its size, it approaches the quality of much larger systems and even surpasses the prior Qwen3-30B on several key benchmarks.

Key Innovations and Performance

The performance gains in Qwen 3.5 are not just due to better data, but also architectural refinements and scaled training techniques:

Native Multimodality: Unlike many compact models that add vision or audio capabilities via adapters, Qwen 3.5 models are natively multimodal from the ground up.
Improved Architecture: The use of Gated Delta Networks and sparse MoE allows for higher parameter counts with lower active computational costs.
Scaled RL Training: All models underwent extensive Reinforcement Learning (RL) based on reasoning signals, significantly improving their ability to follow complex instructions and perform multi-step tasks.

Benchmark Highlights

The Qwen 3.5-9B model stands out as a particularly impressive achievement, scoring 82.5 on MMLU-Pro and 81.7 on GPQA Diamond. In vision tasks, it outperforms GPT-5-Nano on benchmarks like MMMU-Pro (70.1 vs 57.2) and MathVision (78.9 vs 62.2).

Availability and Ecosystem

In line with their commitment to open science, Alibaba has released both the Instruct and Base versions of these models under the Apache 2.0 license. This ensures that developers can freely integrate these high-performance compact models into their own applications.

The weights are available on Hugging Face and ModelScope, providing immediate access to the global AI community.

Conclusion

The Qwen 3.5 compact series proves that "small" no longer means "incapable." By delivering state-of-the-art benchmarks in models as small as 9B and 4B, Alibaba is democratizing access to high-quality AI for developers who don't have access to massive compute clusters. These models are set to become a staple for the next generation of edge-AI and agentic workflows.

Qwen 3.5: Scaling Intelligence in Compact Models

Introduction

The Compact Lineup: From Edge to Cloud

Key Innovations and Performance

Benchmark Highlights

Availability and Ecosystem

Conclusion

Sources

Frequently Asked Questions

What are the sizes available in the Qwen 3.5 compact lineup?

Are the Qwen 3.5 compact models multimodal?

How does the Qwen 3.5-9B model compare to previous generations?

What license are these models released under?

ChatGPT-5.4 Leaks: 2M Context, Full-Res Vision, and Agentic Power

Tencent HY-WU: Dynamic LoRA for Precise Image Editing

Related Articles

Google's SensorFM: A Foundation Model for Wearable Health Data

From Token to Transistor: What Happens When You Send a Prompt

Meta Launches Muse Spark 1.1, Its First Paid AI Model

Continue Your AI Journey