← Library
source: "raw/articles/evoskill-automated-skill-discovery-for-multi-agent-systems.md"
Summary: EvoSkill - Automated Skill Discovery for Multi-Agent Systems
TL;DR: EvoSkill automatically discovers and refines reusable agent skills through iterative failure analysis, improving performance on OfficeQA (+7.3%) and SealQA (+12.1%) while producing skills that transfer zero-shot to new tasks.
Key Points
- Three-agent framework: Executor (runs tasks), Proposer (analyzes failures), and Skill-Builder (materializes skills)
- Maintains Pareto frontier of top-k agent programs, retaining only skills that improve validation performance
- Uses textual feedback descent to evolve skills rather than optimizing low-level artifacts like prompts or code
- Skills are structured as folders containing SKILL.md instructions, metadata, and optional helper scripts
- Evaluated on OfficeQA (grounded reasoning over Treasury data) achieving 60.6% → 67.9% accuracy
- On SealQA (search-augmented QA with noisy retrieval) achieved 26.6% → 38.7% accuracy
- Skills evolved on SealQA transferred zero-shot to BrowseComp, improving accuracy by 5.3%
- Uses round-robin parent selection from frontier and stratified data partitioning for training/validation/test splits
- Git-based version control tracks program lineage with each agent configuration as a branch
Concepts Covered
- Agent Skills — reusable, domain-specific workflows and code that augment coding agents
- Textual Feedback Descent — optimization framework using natural language feedback rather than scalar rewards
- Pareto Frontier — maintains bounded set of top-performing programs for selection
- Zero-Shot Transfer — skills generalize to unseen tasks without modification
- Multi-Agent Systems — collaborative framework with specialized proposer and builder agents
- Evolutionary Optimization — iterative mutation and selection of agent capabilities
- Grounded Reasoning — OfficeQA benchmark requiring navigation of complex Treasury documents
- Search-Augmented QA — SealQA benchmark testing retrieval under adversarial conditions
Images/Figures
— Overview of EvoSkill process
— Detailed loop diagram showing three agents and iteration process
— Performance results on OfficeQA across training splits and tolerance levels