source: "raw/articles/mobile-agent-v35-multi-platform-fundamental-gui-agents.md"

Summary: Mobile-Agent-v3.5: Multi-platform Fundamental GUI Agents

TL;DR: Introduces GUI-Owl-1.5, a family of native GUI agent models (2B-235B parameters) achieving state-of-the-art performance across 20+ benchmarks for multi-platform GUI automation on desktop, mobile, and browser environments.

Key Points

  • GUI-Owl-1.5 features instruct/thinking variants in multiple sizes (2B/4B/8B/32B/235B) supporting desktop, mobile, browser platforms
  • Achieves SOTA results: 56.5 on OSWorld, 71.6 on AndroidWorld, 48.4 on WebArena, 80.3 on ScreenSpotPro
  • Introduces Hybrid Data Flywheel combining simulated and cloud-based environments for efficient trajectory generation
  • Implements Unified CoT Synthesis pipeline for enhanced reasoning, memory, and multi-agent adaptation capabilities
  • Proposes MRPO (Multi-platform Reinforcement Policy Optimization) for stable RL training across heterogeneous devices
  • Uses DAG-based task synthesis for controllable coverage of high-frequency operation patterns
  • Incorporates virtual environment-based trajectory production to bypass real-world exploration limitations
  • Features hierarchical context management with sliding window mechanism for long-horizon tasks
  • Supports tool/MCP invocation, memory management, and multi-agent collaboration
  • Built on Qwen3-VL with three-stage training: pre-training, supervised fine-tuning, and reinforcement learning

Concepts Covered

Related Concepts