source: "raw/articles/topocurate-modeling-interaction-topology-for-tool-use-agent-training.md"

Summary: TopoCurate: Modeling Interaction Topology for Tool-Use Agent Training

TL;DR: TopoCurate transforms agent training data curation from outcome-based filtering to topological interaction modeling, using a semantic quotient topology to select high-quality training trajectories and RL tasks that maximize learning value.

Key Points

  • Core Problem: Standard tool-use agent training relies on outcome-based filtering (successful trajectories for SFT, pass-rate thresholds for RL) which ignores interaction dynamics and creates the "Outcome Equivalence Illusion"
  • Two Key Issues: SFT selection bias causes covariate shift and mode collapse; RL gradient vanishing occurs when tasks with identical pass rates have vastly different training potential
  • Methodology: Projects multi-trial rollouts into unified semantic quotient topology by merging equivalent action-observation states, transforming linear trajectories into structured manifolds
  • SFT Selection Metrics: Three topological metrics - Reflective Recovery (error correction), Semantic Efficiency (redundancy reduction), Distributional Diversity (mode collapse prevention)
  • RL Selection Metrics: Error Branch Ratio and Strategic Heterogeneity to maximize gradient Signal-to-Noise Ratio
  • Results: Consistent gains of 4.2% (SFT) and 6.9% (RL) over baselines on BFCLv3 and Tau2 Bench
  • Theoretical Foundation: SFT minimizes KL divergence to robust expert policy; RL maximizes Fisher information of policy gradients

Concepts Covered

Images and Figures

  • Figure 1: Overview diagram showing three-stage TopoCurate framework (topological modeling, trajectory selection for SFT, task selection for RL)
  • Figure 2: Pass@k performance comparison across model scales and domains
  • Figure 3: Model behavior analysis comparing reflection rates, efficiency, and diversity metrics
  • Figure 4: RL training dynamics showing evaluation accuracy across domains
  • Figure 5: Training reward curves demonstrating impact of topological curation

Related Concepts