← Library
source: "raw/articles/hiper-hierarchical-reinforcement-learning-with-explicit-credit-assignment-for-la.md"
Summary: HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents
TL;DR: HiPER introduces a hierarchical RL framework that separates high-level planning from low-level execution in LLM agents, achieving state-of-the-art performance on interactive benchmarks through explicit subgoal management and hierarchical advantage estimation.
Key Points
- Proposes Plan-Execute interface that decomposes agent decisions into switching (SWITCH/KEEP), subgoal generation, and action execution
- Develops Hierarchical Advantage Estimation (HAE) that assigns credit at both planning and execution levels
- Achieves 97.4% success on ALFWorld and 83.3% on WebShop with Qwen2.5-7B-Instruct (+6.6% and +8.3% over best baseline)
- Demonstrates 2.8× sample efficiency improvement over flat baselines on ALFWorld
- Shows largest improvements on multi-step sequential tasks requiring multiple dependent subtasks
- Proves HAE provides unbiased gradient estimation and variance reduction compared to flat GAE
- Uses single shared critic with two heads rather than separate critics, incurring only 0.8% memory overhead
- Plan-Execute prompting itself improves baseline performance even without HAE
Concepts Covered
- Hierarchical Reinforcement Learning — extends options framework to open-vocabulary subgoals in LLM agents
- Credit Assignment — develops hierarchical advantage estimation aligned with temporal structure
- Policy Gradient Methods — derives factorized gradient for Plan-Execute policy structure
- LLM Agent Training — applies RL to multi-turn interactive decision-making with sparse rewards
- Temporal Abstraction — makes implicit hierarchical structure in agent behavior explicit
- Variance Reduction — proves HAE reduces variance through boundary bootstrapping and option conditioning