← Library
source: "raw/articles/topocurate-modeling-interaction-topology-for-tool-use-agent-training.md"
Summary: TopoCurate: Modeling Interaction Topology for Tool-Use Agent Training
TL;DR: TopoCurate transforms agent training data curation from outcome-based filtering to topological interaction modeling, using a semantic quotient topology to select high-quality training trajectories and RL tasks that maximize learning value.
Key Points
- Core Problem: Standard tool-use agent training relies on outcome-based filtering (successful trajectories for SFT, pass-rate thresholds for RL) which ignores interaction dynamics and creates the "Outcome Equivalence Illusion"
- Two Key Issues: SFT selection bias causes covariate shift and mode collapse; RL gradient vanishing occurs when tasks with identical pass rates have vastly different training potential
- Methodology: Projects multi-trial rollouts into unified semantic quotient topology by merging equivalent action-observation states, transforming linear trajectories into structured manifolds
- SFT Selection Metrics: Three topological metrics - Reflective Recovery (error correction), Semantic Efficiency (redundancy reduction), Distributional Diversity (mode collapse prevention)
- RL Selection Metrics: Error Branch Ratio and Strategic Heterogeneity to maximize gradient Signal-to-Noise Ratio
- Results: Consistent gains of 4.2% (SFT) and 6.9% (RL) over baselines on BFCLv3 and Tau2 Bench
- Theoretical Foundation: SFT minimizes KL divergence to robust expert policy; RL maximizes Fisher information of policy gradients
Concepts Covered
- Semantic Quotient Topology — mathematical framework for projecting interaction trajectories into structured manifolds
- Reflective Recovery — prioritizing trajectories that demonstrate error correction and self-healing behaviors
- Covariate Shift — distribution mismatch between training and test environments in agent deployment
- Mode Collapse — agent overfitting to narrow behavioral patterns rather than learning diverse strategies
- Signal-to-Noise Ratio — gradient quality metric for reinforcement learning optimization
- Error Branch Ratio — proportion of decision branches leading to failure, indicating task structural complexity
- Strategic Heterogeneity — diversity of valid solution paths within a task
- Fisher Information Matrix — quantifies policy sensitivity to parameter changes in RL
- Group Relative Policy Optimization (GRPO) — RL algorithm using group-relative advantages for policy updates
Images and Figures
- Figure 1: Overview diagram showing three-stage TopoCurate framework (topological modeling, trajectory selection for SFT, task selection for RL)
- Figure 2: Pass@k performance comparison across model scales and domains
- Figure 3: Model behavior analysis comparing reflection rates, efficiency, and diversity metrics
- Figure 4: RL training dynamics showing evaluation accuracy across domains
- Figure 5: Training reward curves demonstrating impact of topological curation