source: "raw/articles/hiper-hierarchical-reinforcement-learning-with-explicit-credit-assignment-for-la.md"

Summary: HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

TL;DR: HiPER introduces a hierarchical RL framework that separates high-level planning from low-level execution in LLM agents, achieving state-of-the-art performance on interactive benchmarks through explicit subgoal management and hierarchical advantage estimation.

Key Points

  • Proposes Plan-Execute interface that decomposes agent decisions into switching (SWITCH/KEEP), subgoal generation, and action execution
  • Develops Hierarchical Advantage Estimation (HAE) that assigns credit at both planning and execution levels
  • Achieves 97.4% success on ALFWorld and 83.3% on WebShop with Qwen2.5-7B-Instruct (+6.6% and +8.3% over best baseline)
  • Demonstrates 2.8× sample efficiency improvement over flat baselines on ALFWorld
  • Shows largest improvements on multi-step sequential tasks requiring multiple dependent subtasks
  • Proves HAE provides unbiased gradient estimation and variance reduction compared to flat GAE
  • Uses single shared critic with two heads rather than separate critics, incurring only 0.8% memory overhead
  • Plan-Execute prompting itself improves baseline performance even without HAE

Concepts Covered

Related Concepts