source: "raw/articles/evoskill-automated-skill-discovery-for-multi-agent-systems.md"

Summary: EvoSkill - Automated Skill Discovery for Multi-Agent Systems

TL;DR: EvoSkill automatically discovers and refines reusable agent skills through iterative failure analysis, improving performance on OfficeQA (+7.3%) and SealQA (+12.1%) while producing skills that transfer zero-shot to new tasks.

Key Points

  • Three-agent framework: Executor (runs tasks), Proposer (analyzes failures), and Skill-Builder (materializes skills)
  • Maintains Pareto frontier of top-k agent programs, retaining only skills that improve validation performance
  • Uses textual feedback descent to evolve skills rather than optimizing low-level artifacts like prompts or code
  • Skills are structured as folders containing SKILL.md instructions, metadata, and optional helper scripts
  • Evaluated on OfficeQA (grounded reasoning over Treasury data) achieving 60.6% → 67.9% accuracy
  • On SealQA (search-augmented QA with noisy retrieval) achieved 26.6% → 38.7% accuracy
  • Skills evolved on SealQA transferred zero-shot to BrowseComp, improving accuracy by 5.3%
  • Uses round-robin parent selection from frontier and stratified data partitioning for training/validation/test splits
  • Git-based version control tracks program lineage with each agent configuration as a branch

Concepts Covered

Images/Figures

  • Uncaptioned workflow diagram — Overview of EvoSkill process
  • Figure 1 — Detailed loop diagram showing three agents and iteration process
  • Figure 2 — Performance results on OfficeQA across training splits and tolerance levels

Related Concepts