Abstraction Error

Summary: A metric measuring the maximum difference in optimal Q-values between states that share identical concept representations in reinforcement learning. It quantifies how well concept-based state abstractions preserve the decision structure of the underlying Markov Decision Process, serving as the primary optimization criterion for automated concept selection algorithms that maintain performance guarantees.

Overview

Abstraction error is a fundamental metric in Concept-Based Models and State Abstraction theory that measures decision structure preservation across states. Formally defined as:

ε(g_c) = max_{s,s': g(s)=g(s')} max_a |Q(s,a) - Q(s',a)|**

Where:

g_c is the concept predictor function mapping states to concept representations
s and s' are states with identical concept representations under g_c
Q*(s,a) is the optimal Q-value for taking action a in state s

The abstraction error captures the worst-case Q-value difference between any two states grouped together by the concept predictor. This directly implements the Decision-Relevant Concepts principle - states requiring different optimal actions should maintain distinct concept representations to preserve decision structure. Zero abstraction error ensures perfect preservation of decision structure, meaning no two states requiring different actions share concept representations.

Concept-based model architecture

Key Details

Performance Guarantees: Provides theoretical bounds on policy performance loss with V^π*(s) - V^π_c*(s) ≤ 2ε/(1-γ)² where ε is abstraction error and γ is the discount factor
Optimization Target: The Decision-Relevant Selection algorithm directly minimizes abstraction error through Mixed-Integer Linear Programming formulations to automatically select concept sets without manual domain expertise
Computational Complexity: While concept selection to minimize abstraction error is NP-hard in general, environmental constraints limit the effective state space making practical solutions tractable
Measurement Requirements: Computing exact abstraction error requires knowledge of optimal Q-values, making it primarily useful for algorithm design rather than runtime evaluation
Empirical Validation: Lower abstraction error consistently correlates with better policy performance across diverse environments including CartPole, MiniGrid, Pong, Boxing, and healthcare glucose management
Predictor Dependencies: Final performance depends on both minimizing abstraction error during concept selection and achieving high accuracy in concept prediction during deployment
Extension to Imperfect Predictors: The DRS-log algorithm extends abstraction error minimization to scenarios with probabilistic concept predictors using separation constraints
Automatic Recovery: DRS algorithm using abstraction error minimization automatically recovers manually curated concept sets while matching or exceeding their performance
Intervention Effectiveness: Decision-relevant concepts selected by minimizing abstraction error improve Test-Time Intervention effectiveness by 40-87% across environments compared to baseline selection methods

Performance with different concept accuracies

Relationships

Decision-Relevant Concepts — Abstraction error operationalizes the core principle that concepts should distinguish states requiring different optimal actions
State Abstraction — Provides theoretical foundation for abstraction error as a quality metric for approximate state representations in reinforcement learning
Q-Distance — Related metric measuring Q-value differences between states, with abstraction error focusing specifically on concept-grouped states
Mixed-Integer Linear Programming — Enables tractable optimization formulations for minimizing abstraction error with combinatorial concept selection constraints
Concept-Based Models — Abstraction error serves as the primary evaluation metric for concept quality in interpretable reinforcement learning architectures
Test-Time Intervention — Lower abstraction error amplifies the effectiveness of human corrections to concept predictions by ensuring interventions target decision-relevant distinctions
Markov Decision Processes — Abstraction error measures how well concept representations preserve the underlying MDP structure for optimal decision making
Feature Selection — Concept selection algorithms use abstraction error rather than traditional feature importance metrics, focusing on decision preservation rather than prediction accuracy
Interpretable Reinforcement Learning — Abstraction error bridges the gap between interpretability goals and performance guarantees in explainable RL systems
Approximate State Abstraction — Abstraction error quantifies the quality of approximate abstractions by measuring decision structure preservation
Policy Optimization — Provides performance bounds connecting abstraction error to final policy quality and value function preservation

Sources

sources/selecting-decision-relevant-concepts-in-reinforcement-learning — Primary source defining abstraction error and demonstrating its central role in the Decision-Relevant Selection algorithm for automatic concept selection with theoretical performance bounds