State Abstractions

Summary: Theoretical framework in reinforcement learning that connects concept selection to approximate state abstraction, enabling the identification of decision-relevant features that preserve optimal decision structure. Provides mathematical foundation for automatically selecting interpretable concepts while maintaining performance guarantees.

Overview

State abstractions serve as the theoretical backbone for concept-based reinforcement learning, establishing when and how concept representations can maintain the essential decision-making structure of the original state space. The framework addresses a fundamental challenge in Reinforcement Learning Interpretability: how to reduce complex state spaces to human-interpretable concepts without losing critical information needed for optimal decision-making.

The core principle is that effective state abstractions must preserve action-value differences across states. When states are mapped to the same concept representation, they should ideally share the same optimal action. This insight directly connects to Approximate Dynamic Programming theory, where abstraction quality is measured by how well the reduced representation maintains the underlying Markov Decision Process structure.

The mathematical foundation centers on abstraction error (ε), which quantifies how much action-value information is lost during concept-based state compression. The framework provides performance bounds showing that policies using selected concepts have value loss ≤ 2ε/(1-γ)², where γ is the discount factor. This bound guarantees that well-chosen concepts can maintain near-optimal performance while enabling interpretability.

Key Details

Decision-Relevance Criterion: Concepts are decision-relevant if removing them would cause agents to confuse states requiring different optimal actions
Separation Constraint: States with identical concept representations must share the same optimal action to preserve decision structure
NP-Hard Complexity: The concept selection problem is computationally hard in general, but environmental constraints often limit the effective state space making it tractable
Probabilistic Extensions: Framework extends to imperfect concept predictors through probabilistic separation constraints that account for prediction uncertainty
Performance Guarantees: Theoretical bounds directly link abstraction quality to policy performance, enabling principled concept selection
Empirical Validation: Successfully applied across diverse domains including control tasks (CartPole), grid worlds (MiniGrid), Atari games (Pong, Boxing), and healthcare applications

Relationships

Concept-Based Models — provides the architectural foundation that state abstractions optimize
Decision-Relevant Concepts — defines the selection criteria derived from abstraction theory
Test-Time Intervention — effectiveness depends on abstraction quality since better concepts enable more impactful human corrections
Mixed Integer Linear Programming — optimization technique used to solve the concept selection problem under abstraction constraints
Feature Selection — classical machine learning approach that state abstractions extend with RL-specific decision-relevance criteria
State Space Abstraction — broader theoretical area that concept-based abstractions specialize for interpretability
Abstraction Error — key metric quantifying how well concept representations preserve original state space structure

Sources

sources/selecting-decision-relevant-concepts-in-reinforcement-learning — introduced the theoretical framework connecting concept selection to state abstraction theory, provided performance bounds, and demonstrated practical algorithms for automatic concept selection