Alibaba GUI-Owl-1.5 & Mobile-Agent-v3.5: The Next Era of GUI Agents

Alibaba Tongyi Lab reveals GUI-Owl-1.5 and Mobile-Agent-v3.5, a powerful family of open-source models designed for autonomous GUI interaction across desktop and mobile.

by HowAIWorks Team
AlibabaGUI AgentsOpen Source AIQwen3-VLMobile AgentMultimodal AIAutonomous AgentsArtificial Intelligence

Alibaba GUI-Owl-1.5 & Mobile-Agent-v3.5 Featured Image

Introduction

Alibaba Tongyi Lab has recently released the open-source GUI-Owl-1.5 and Mobile-Agent-v3.5, a new family of agentic models that represent a major step toward fully autonomous AI assistants. These models are specifically designed to interact directly with graphical user interfaces (GUIs), whether on a desktop computer, a mobile device, or within a web browser.

Built upon the robust Qwen3-VL foundation, this new release aims to provide a unified paradigm for GUI interaction, allowing AI agents to navigate, click, type, and execute tasks just like a human user would.

Unified GUI Agent Paradigm

The GUI-Owl-1.5 and Mobile-Agent-v3.5 family is structured to handle a wide range of tasks with varying complexity and latency requirements. Alibaba has released six different sizes to cater to different needs:

  • 2B / 4B / 8B / 32B Instruct: These are fast, high-efficiency models designed for low-latency tasks. They operate without Chain-of-Thought (CoT), making them ideal for quick interactions.
  • 8B / 32B Thinking: These models are optimized for complex reasoning. They incorporate advanced planning and multi-step reasoning capabilities, allowing them to solve more sophisticated GUI workflows.

Architecture and Training Innovations

The impressive performance of these models is rooted in three core architectural and training pillars:

  • Hybrid Data Flywheel: By combining physical simulations with cloud sandboxes, the team generated massive amounts of GUI trajectories. These trajectories were verified at specific checkpoints to ensure high-quality training data.
  • Unified CoT Synthesis: The models integrate world modeling, knowledge injection, and tool/MCP (Model Context Protocol) reasoning into every step of their operation.
  • MRPO (Multi-platform Reinforcement Learning): A specialized RL approach featuring an online rollout buffer and protection against "outcome collapse," ensuring the models remain stable and versatile across different operating systems and platforms.

SOTA Performance across Benchmarks

Alibaba's new models have set new open-source SOTA (State Of The Art) records on more than 20 GUI agent benchmarks. Some of the most notable results include:

  • OSWorld-Verified: 56.5 (32B-Instruct)
  • AndroidWorld: 71.6 (8B-Thinking)
  • ScreenSpot-Pro: 80.3 (using a two-stage crop refine technique)
  • WebArena: 48.4 (32B-Thinking)

These scores demonstrate a significant leap in the ability of open-source models to accurately perceive and interact with complex, real-world digital environments.

Conclusion

The release of GUI-Owl-1.5 and Mobile-Agent-v3.5 marks a significant milestone in the development of "Large Action Models." By providing open-source access to models that can seamlessly bridge the gap between AI and human-like computer interaction, Alibaba is accelerating the path toward truly autonomous digital companions. Whether it's automating repetitive office tasks or navigating complex mobile apps, these agents are proving that the future of UI is not just responsive—it's agentic.

Sources

Frequently Asked Questions

These are open-source GUI agent models from Alibaba Tongyi Lab, built on Qwen3-VL, capable of directly controlling desktop, mobile, and browser interfaces.
The models come in six sizes: 2B, 4B, 8B, and 32B Instruct (for low-latency tasks) and 8B and 32B Thinking (for complex reasoning and planning).
While 'Instruct' models focus on fast execution without Chain-of-Thought, 'Thinking' models incorporate advanced reasoning and planning for more complex multi-step GUI tasks.
They are currently open-source SOTA on over 20 tests, including OSWorld (56.5), AndroidWorld (71.6), and ScreenSpot-Pro (80.3).
The models are available on ModelScope, and the source code is hosted on GitHub under the X-PLUG/MobileAgent repository.

Continue Your AI Journey

Explore our lessons and glossary to deepen your understanding.