source: "raw/articles/topocurate-modeling-interaction-topology-for-tool-use-agent-training.md"

Summary: TopoCurate: Modeling Interaction Topology for Tool-Use Agent Training

TL;DR: TopoCurate transforms agent training data curation from outcome-based filtering to topological interaction modeling, using a semantic quotient topology to select high-quality training trajectories and RL tasks that maximize learning value.

Key Points

Core Problem: Standard tool-use agent training relies on outcome-based filtering (successful trajectories for SFT, pass-rate thresholds for RL) which ignores interaction dynamics and creates the "Outcome Equivalence Illusion"
Two Key Issues: SFT selection bias causes covariate shift and mode collapse; RL gradient vanishing occurs when tasks with identical pass rates have vastly different training potential
Methodology: Projects multi-trial rollouts into unified semantic quotient topology by merging equivalent action-observation states, transforming linear trajectories into structured manifolds
SFT Selection Metrics: Three topological metrics - Reflective Recovery (error correction), Semantic Efficiency (redundancy reduction), Distributional Diversity (mode collapse prevention)
RL Selection Metrics: Error Branch Ratio and Strategic Heterogeneity to maximize gradient Signal-to-Noise Ratio
Results: Consistent gains of 4.2% (SFT) and 6.9% (RL) over baselines on BFCLv3 and Tau2 Bench
Theoretical Foundation: SFT minimizes KL divergence to robust expert policy; RL maximizes Fisher information of policy gradients

Concepts Covered

Semantic Quotient Topology — mathematical framework for projecting interaction trajectories into structured manifolds
Reflective Recovery — prioritizing trajectories that demonstrate error correction and self-healing behaviors
Covariate Shift — distribution mismatch between training and test environments in agent deployment
Mode Collapse — agent overfitting to narrow behavioral patterns rather than learning diverse strategies
Signal-to-Noise Ratio — gradient quality metric for reinforcement learning optimization
Error Branch Ratio — proportion of decision branches leading to failure, indicating task structural complexity
Strategic Heterogeneity — diversity of valid solution paths within a task
Fisher Information Matrix — quantifies policy sensitivity to parameter changes in RL
Group Relative Policy Optimization (GRPO) — RL algorithm using group-relative advantages for policy updates

Images and Figures

Figure 1: Overview diagram showing three-stage TopoCurate framework (topological modeling, trajectory selection for SFT, task selection for RL)
Figure 2: Pass@k performance comparison across model scales and domains
Figure 3: Model behavior analysis comparing reflection rates, efficiency, and diversity metrics
Figure 4: RL training dynamics showing evaluation accuracy across domains
Figure 5: Training reward curves demonstrating impact of topological curation

source: "raw/articles/topocurate-modeling-interaction-topology-for-tool-use-agent-training.md"

Summary: TopoCurate: Modeling Interaction Topology for Tool-Use Agent Training

Key Points

Concepts Covered

Images and Figures

Related Concepts