Alibaba GUI-Owl-1.5 & Mobile-Agent-v3.5: The Next Era of GUI Agents

Alibaba GUI-Owl-1.5 & Mobile-Agent-v3.5 Featured Image

Introduction

Alibaba Tongyi Lab has recently released the open-source GUI-Owl-1.5 and Mobile-Agent-v3.5, a new family of agentic models that represent a major step toward fully autonomous AI assistants. These models are specifically designed to interact directly with graphical user interfaces (GUIs), whether on a desktop computer, a mobile device, or within a web browser.

Built upon the robust Qwen3-VL foundation, this new release aims to provide a unified paradigm for GUI interaction, allowing AI agents to navigate, click, type, and execute tasks just like a human user would.

Unified GUI Agent Paradigm

The GUI-Owl-1.5 and Mobile-Agent-v3.5 family is structured to handle a wide range of tasks with varying complexity and latency requirements. Alibaba has released six different sizes to cater to different needs:

2B / 4B / 8B / 32B Instruct: These are fast, high-efficiency models designed for low-latency tasks. They operate without Chain-of-Thought (CoT), making them ideal for quick interactions.
8B / 32B Thinking: These models are optimized for complex reasoning. They incorporate advanced planning and multi-step reasoning capabilities, allowing them to solve more sophisticated GUI workflows.

Architecture and Training Innovations

The impressive performance of these models is rooted in three core architectural and training pillars:

Hybrid Data Flywheel: By combining physical simulations with cloud sandboxes, the team generated massive amounts of GUI trajectories. These trajectories were verified at specific checkpoints to ensure high-quality training data.
Unified CoT Synthesis: The models integrate world modeling, knowledge injection, and tool/MCP (Model Context Protocol) reasoning into every step of their operation.
MRPO (Multi-platform Reinforcement Learning): A specialized RL approach featuring an online rollout buffer and protection against "outcome collapse," ensuring the models remain stable and versatile across different operating systems and platforms.

SOTA Performance across Benchmarks

Alibaba's new models have set new open-source SOTA (State Of The Art) records on more than 20 GUI agent benchmarks. Some of the most notable results include:

OSWorld-Verified: 56.5 (32B-Instruct)
AndroidWorld: 71.6 (8B-Thinking)
ScreenSpot-Pro: 80.3 (using a two-stage crop refine technique)
WebArena: 48.4 (32B-Thinking)

These scores demonstrate a significant leap in the ability of open-source models to accurately perceive and interact with complex, real-world digital environments.

Conclusion

The release of GUI-Owl-1.5 and Mobile-Agent-v3.5 marks a significant milestone in the development of "Large Action Models." By providing open-source access to models that can seamlessly bridge the gap between AI and human-like computer interaction, Alibaba is accelerating the path toward truly autonomous digital companions. Whether it's automating repetitive office tasks or navigating complex mobile apps, these agents are proving that the future of UI is not just responsive—it's agentic.

Alibaba GUI-Owl-1.5 & Mobile-Agent-v3.5: The Next Era of GUI Agents

Introduction

Unified GUI Agent Paradigm

Architecture and Training Innovations

SOTA Performance across Benchmarks

Conclusion

Sources

Frequently Asked Questions

What is GUI-Owl-1.5 and Mobile-Agent-v3.5?

What model sizes are available for Alibaba's new GUI agents?

How does the 'Thinking' model differ from the 'Instruct' model?

What benchmarks do these models lead in?

Where can I access the models and code?

Related Articles

MIT Deep Learning Fall 2024 Course Released for Free

OpenClaw: 869 AI Skills for Medical Research

Alibaba Qwen 3.6-Plus: 1M Context Window and Agentic Coding

Continue Your AI Journey