source: "raw/articles/hiper-hierarchical-reinforcement-learning-with-explicit-credit-assignment-for-la.md"

Summary: HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

TL;DR: HiPER introduces a hierarchical RL framework that separates high-level planning from low-level execution in LLM agents, achieving state-of-the-art performance on interactive benchmarks through explicit subgoal management and hierarchical advantage estimation.

Key Points

Proposes Plan-Execute interface that decomposes agent decisions into switching (SWITCH/KEEP), subgoal generation, and action execution
Develops Hierarchical Advantage Estimation (HAE) that assigns credit at both planning and execution levels
Achieves 97.4% success on ALFWorld and 83.3% on WebShop with Qwen2.5-7B-Instruct (+6.6% and +8.3% over best baseline)
Demonstrates 2.8× sample efficiency improvement over flat baselines on ALFWorld
Shows largest improvements on multi-step sequential tasks requiring multiple dependent subtasks
Proves HAE provides unbiased gradient estimation and variance reduction compared to flat GAE
Uses single shared critic with two heads rather than separate critics, incurring only 0.8% memory overhead
Plan-Execute prompting itself improves baseline performance even without HAE

Concepts Covered

Hierarchical Reinforcement Learning — extends options framework to open-vocabulary subgoals in LLM agents
Credit Assignment — develops hierarchical advantage estimation aligned with temporal structure
Policy Gradient Methods — derives factorized gradient for Plan-Execute policy structure
LLM Agent Training — applies RL to multi-turn interactive decision-making with sparse rewards
Temporal Abstraction — makes implicit hierarchical structure in agent behavior explicit
Variance Reduction — proves HAE reduces variance through boundary bootstrapping and option conditioning

source: "raw/articles/hiper-hierarchical-reinforcement-learning-with-explicit-credit-assignment-for-la.md"

Summary: HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

Key Points

Concepts Covered

Related Concepts