source: "raw/articles/ui-voyager-a-self-evolving-gui-agent-learning-via-failed-experience.md"

Summary: UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

TL;DR: A mobile GUI agent that achieves 81.0% success rate on AndroidWorld by learning from failed trajectories through fork point detection and self-distillation, outperforming larger models and human performance.

Key Points

Performance: 4B model achieves 81.0% Pass@1 success rate on AndroidWorld, exceeding human performance (80.0%) and all baseline methods including much larger models (235B parameters)
Two-stage training framework:
- Stage 1: Rejection Fine-Tuning (RFT) for automatic data-model co-evolution
- Stage 2: Group Relative Self-Distillation (GRSD) for learning from failed trajectories
Fork point detection: Uses SSIM-based matching to identify critical decision points where successful and failed trajectories diverge
Credit assignment solution: Addresses sparse reward problem in long-horizon GUI tasks by providing dense step-level supervision
Self-corrective learning: Transforms failed trajectories into high-quality training data without manual annotation
Evaluation: Tested on 116 diverse AndroidWorld tasks across real-world mobile applications
Comparison: GRSD significantly outperforms standard RL methods (GRPO, PPO) which plateau around 76%

Concepts Covered

Multimodal Large Language Models — backbone architecture using Qwen3-VL-4B-Instruct
Reinforcement Learning — addresses credit assignment challenges in sparse reward environments
Group Relative Policy Optimization — baseline RL method that GRSD outperforms
AndroidWorld Benchmark — evaluation environment with 116 diverse mobile GUI tasks
Rejection Sampling — filtering mechanism for high-quality trajectory collection
Self-Distillation — knowledge transfer from successful to failed trajectories
SSIM Image Matching — computer vision technique for identifying equivalent screen states
Mobile GUI Automation — target application domain for autonomous phone operation
Credit Assignment Problem — fundamental RL challenge addressed by fork point detection
Self-Evolving Training — iterative improvement without manual data annotation

Images and Figures

Figure 1 (raw/articles/2603.24533v1/x1.png): Performance comparison showing UI-Voyager achieving 81.0% vs other models
Figure 2 (raw/articles/2603.24533v1/x2.png): Training pipeline overview showing RFT and GRSD stages
Figure 3 (raw/articles/2603.24533v1/x3.png): Fork point detection illustration with successful/failed trajectory comparison
Figure 4 (raw/articles/2603.24533v1/x4.png): RFT performance improvements and RL baseline comparisons
Figure 5 (raw/articles/2603.24533v1/x5.png): BrowserMaze task example showing fork point at step 12
Figure 6 (raw/articles/2603.24533v1/x6.png): SystemBluetoothTurnOff task example with fork point at step 0
Figure 7 (raw/articles/2603.24533v1/x7.png): Self-corrective sample construction process
Figure 8 (raw/articles/2603.24533v1/x8.png): GRSD vs GRPO/PPO training performance comparison
Figure 9 (raw/articles/2603.24533v1/x9.png): Performance on low-success-rate tasks

source: "raw/articles/ui-voyager-a-self-evolving-gui-agent-learning-via-failed-experience.md"

Summary: UI-Voyager: A Self-Evolving GUI Agent Learning via Failed Experience

Key Points

Concepts Covered

Images and Figures

Related Concepts