Qwen 3.5: Scaling Intelligence in Compact Models

Alibaba's new Qwen 3.5 series packs flagship intelligence into compact sizes (0.8B to 9B), featuring native multimodality and enhanced agentic capabilities.

by HowAIWorks Team
AlibabaQwenLLMOpen Source AIMultimodal AIEdge AIAgentic AIMachine Learning

Introduction

Alibaba's Qwen team has continued its rapid release cycle with the introduction of the Qwen 3.5 compact lineup. This new release focuses on bringing flagship-level intelligence to smaller, more efficient form factors ranging from 0.8B to 9B parameters.

The hallmark of the Qwen 3.5 series is its incredible intelligence density. By leveraging architectural innovations like Gated Delta Networks and sparse Mixture-of-Experts (MoE), Alibaba has managed to create models that punch significantly above their weight class, often outperforming much larger models from previous generations.

The Compact Lineup: From Edge to Cloud

The new release provides a versatile range of models tailored for different deployment scenarios:

  • 0.8B and 2B: Optimized for edge devices, local applications, and ultra-fast inference. These models are ideal for on-device AI where privacy and latency are critical.
  • 4B: A "sweet spot" for lightweight multimodal agents. It offers a strong balance between footprint and reasoning capability, suitable for small-scale AI services.
  • 9B: The high-performance tier of the compact series. Despite its size, it approaches the quality of much larger systems and even surpasses the prior Qwen3-30B on several key benchmarks.

Key Innovations and Performance

The performance gains in Qwen 3.5 are not just due to better data, but also architectural refinements and scaled training techniques:

  • Native Multimodality: Unlike many compact models that add vision or audio capabilities via adapters, Qwen 3.5 models are natively multimodal from the ground up.
  • Improved Architecture: The use of Gated Delta Networks and sparse MoE allows for higher parameter counts with lower active computational costs.
  • Scaled RL Training: All models underwent extensive Reinforcement Learning (RL) based on reasoning signals, significantly improving their ability to follow complex instructions and perform multi-step tasks.

Benchmark Highlights

The Qwen 3.5-9B model stands out as a particularly impressive achievement, scoring 82.5 on MMLU-Pro and 81.7 on GPQA Diamond. In vision tasks, it outperforms GPT-5-Nano on benchmarks like MMMU-Pro (70.1 vs 57.2) and MathVision (78.9 vs 62.2).

Availability and Ecosystem

In line with their commitment to open science, Alibaba has released both the Instruct and Base versions of these models under the Apache 2.0 license. This ensures that developers can freely integrate these high-performance compact models into their own applications.

The weights are available on Hugging Face and ModelScope, providing immediate access to the global AI community.

Conclusion

The Qwen 3.5 compact series proves that "small" no longer means "incapable." By delivering state-of-the-art benchmarks in models as small as 9B and 4B, Alibaba is democratizing access to high-quality AI for developers who don't have access to massive compute clusters. These models are set to become a staple for the next generation of edge-AI and agentic workflows.

Sources

Frequently Asked Questions

The Qwen 3.5 compact series includes 0.8B, 2B, 4B, and 9B parameter models, designed for everything from edge devices to light AI services.
Yes, all models in the Qwen 3.5 family are natively multimodal, supporting text, image, and video processing across all sizes.
The 9B model outperforms the 3x larger Qwen3-30B on major benchmarks like MMLU-Pro and GPQA Diamond, demonstrating significant intelligence density.
The Qwen 3.5 compact models are released under the permissive Apache 2.0 license, making them accessible for both research and commercial use.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.