← Library
source: "raw/articles/mobile-agent-v35-multi-platform-fundamental-gui-agents.md"
Summary: Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents
TL;DR: Introduces GUI-Owl-1.5, a family of native GUI agent models (2B-235B parameters) achieving state-of-the-art performance across 20+ benchmarks for multi-platform GUI automation on desktop, mobile, and browser environments.
Key Points
- GUI-Owl-1.5 features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) supporting desktop, mobile, browser platforms
- Achieves SOTA results: 56.5 on OSWorld, 71.6 on AndroidWorld, 48.4 on WebArena, 80.3 on ScreenSpotPro
- Introduces Hybrid Data Flywheel combining simulated and cloud-based environments for efficient trajectory generation
- Implements Unified CoT Synthesis pipeline for enhanced reasoning, memory, and multi-agent adaptation capabilities
- Proposes MRPO (Multi-platform Reinforcement Policy Optimization) for stable RL training across heterogeneous devices
- Uses DAG-based task synthesis for controllable coverage of high-frequency operation patterns
- Incorporates virtual environment-based trajectory production to bypass real-world exploration limitations
- Features hierarchical context management with sliding window mechanism for long-horizon tasks
- Supports tool/MCP invocation, memory management, and multi-agent collaboration
- Built on Qwen3-VL with three-stage training: pre-training, supervised fine-tuning, and reinforcement learning
Concepts Covered
- GUI Agents — end-to-end native models for graphical user interface automation
- Multi-platform Computing — unified approach across desktop, mobile, browser environments
- Chain of Thought Synthesis — augmenting trajectory data with step-wise reasoning and reflection
- Reinforcement Learning for GUI — MRPO algorithm for stable multi-platform policy optimization
- DAG-based Task Generation — directed acyclic graphs for structured task decomposition
- Virtual Environment Training — web-rendering based simulated environments for trajectory generation
- Tool Integration — coordination of GUI operations with external tool and MCP calls
- Context Management — hierarchical compression for long-horizon task execution
- Grounding — visual element localization from natural language queries
- Multi-agent Collaboration — structured frameworks with specialized roles (planner, executor, verifier)