State Abstractions
Summary: Theoretical framework in reinforcement learning that connects concept selection to approximate state abstraction, enabling the identification of decision-relevant features that preserve optimal decision structure. Provides mathematical foundation for automatically selecting interpretable concepts while maintaining performance guarantees.
Overview
State abstractions serve as the theoretical backbone for concept-based reinforcement learning, establishing when and how concept representations can maintain the essential decision-making structure of the original state space. The framework addresses a fundamental challenge in Reinforcement Learning Interpretability: how to reduce complex state spaces to human-interpretable concepts without losing critical information needed for optimal decision-making.
The core principle is that effective state abstractions must preserve action-value differences across states. When states are mapped to the same concept representation, they should ideally share the same optimal action. This insight directly connects to Approximate Dynamic Programming theory, where abstraction quality is measured by how well the reduced representation maintains the underlying Markov Decision Process structure.
The mathematical foundation centers on abstraction error (ε), which quantifies how much action-value information is lost during concept-based state compression. The framework provides performance bounds showing that policies using selected concepts have value loss ≤ 2ε/(1-γ)², where γ is the discount factor. This bound guarantees that well-chosen concepts can maintain near-optimal performance while enabling interpretability.
Key Details
- Decision-Relevance Criterion: Concepts are decision-relevant if removing them would cause agents to confuse states requiring different optimal actions
- Separation Constraint: States with identical concept representations must share the same optimal action to preserve decision structure
- NP-Hard Complexity: The concept selection problem is computationally hard in general, but environmental constraints often limit the effective state space making it tractable
- Probabilistic Extensions: Framework extends to imperfect concept predictors through probabilistic separation constraints that account for prediction uncertainty
- Performance Guarantees: Theoretical bounds directly link abstraction quality to policy performance, enabling principled concept selection
- Empirical Validation: Successfully applied across diverse domains including control tasks (CartPole), grid worlds (MiniGrid), Atari games (Pong, Boxing), and healthcare applications
Relationships
- Concept-Based Models — provides the architectural foundation that state abstractions optimize
- Decision-Relevant Concepts — defines the selection criteria derived from abstraction theory
- Test-Time Intervention — effectiveness depends on abstraction quality since better concepts enable more impactful human corrections
- Mixed Integer Linear Programming — optimization technique used to solve the concept selection problem under abstraction constraints
- Feature Selection — classical machine learning approach that state abstractions extend with RL-specific decision-relevance criteria
- State Space Abstraction — broader theoretical area that concept-based abstractions specialize for interpretability
- Abstraction Error — key metric quantifying how well concept representations preserve original state space structure
Sources
- sources/selecting-decision-relevant-concepts-in-reinforcement-learning — introduced the theoretical framework connecting concept selection to state abstraction theory, provided performance bounds, and demonstrated practical algorithms for automatic concept selection