← Library
source: "raw/articles/autonomous-continual-learning-of-computer-use-agents-for-environment-adaptation.md"
Summary: Autonomous Continual Learning of Computer-Use Agents for Environment Adaptation
TL;DR: Introduces ACuRL, a framework enabling computer-use agents to autonomously adapt to new environments through curriculum-based reinforcement learning without human data.
Key Points
- Real-world digital environments are diverse and dynamic, causing agents to encounter unseen scenarios and distribution shifts that require continual learning
- Presents ACuRL (Autonomous Curriculum Reinforcement Learning) - enables zero-human-data adaptation through exploration, curriculum task generation, and iterative RL
- Introduces CUAJudge, an automatic evaluator achieving 93% agreement with human judgments through state difference analysis and evidence-grounded verification
- Agent explores target environments to collect initial experiences, then undergoes iterative training with curriculum tasks tailored to current capabilities
- Curriculum generator synthesizes tasks based on difficulty levels: Easy (>70% success) → more complex variants, Medium (30-70%) → diverse scenarios, Hard (<30%) → hierarchical decomposition
- Achieves 4-22% performance gains on target environments across 6 representative environments (LibreOffice suite, Thunderbird, Celestia, KAlgebra)
- Demonstrates effective knowledge transfer without catastrophic forgetting - performance preserved or improved on non-target environments
- Parameter analysis reveals highly sparse updates (~20% of parameters significantly changed), explaining robust adaptation without forgetting
- Infrastructure optimizations enable 3-5x training speedup through unified environment management, asynchronous preloading, and batch operations
Concepts Covered
- Continual Learning — study of agents learning sequentially across environments without forgetting previous knowledge
- Computer Use Agents — autonomous agents that interact with digital environments via screenshots and actions
- Curriculum Learning — adaptive task generation that adjusts difficulty based on agent capabilities
- Reinforcement Learning — iterative optimization using Group Relative Policy Optimization (GRPO) for multi-turn trajectories
- Automatic Evaluation — CUAJudge framework for reliable trajectory assessment without human annotation
- Environment Adaptation — both intra-environment (single environment, increasing complexity) and cross-environment (multiple distinct environments)
- Parameter Sparsity — analysis of which model parameters update during continual learning
- Zero-Data Learning — learning framework requiring no human-annotated data, only target environments
Figures and Images
- Figure 1: ACuRL framework overview showing exploration → curriculum generation → iterative RL cycle
- Figure 2: Parameter update sparsity analysis across LLM backbone and vision encoder layers
- Figure 3: Task complexity evolution showing increasing average task length across iterations
- Figures 4-6: Parameter overlap analysis and context examples across different environments