Decision-Relevant Concepts
Summary: Concepts that distinguish states requiring different optimal actions in reinforcement learning systems. States with identical concept representations must share identical optimal actions to preserve decision structure and enable interpretable yet optimal policies with theoretical performance guarantees.
Overview
Decision-relevant concepts form the theoretical foundation for interpretable reinforcement learning by ensuring that human-understandable features directly support optimal decision-making. Unlike arbitrary features that may correlate with outcomes, decision-relevant concepts must satisfy a strict criterion: any two states that share identical concept representations must also share identical optimal actions.
The formal definition relies on State Abstraction theory, where concepts define an abstraction function g(s) that maps states to concept representations. Decision-relevance requires that for any states s and s' where g(s) = g(s'), the optimal Q-values Q*(s,a) = Q*(s',a) for all actions a. The key insight is that concepts are decision-relevant if removing them would cause agents to confuse states requiring different actions.
This principle addresses fundamental challenges in Concept-Based Models where manually selected concepts often fail to capture decision-critical information. Manual concept selection is costly, requires domain expertise, scales poorly, and provides no performance guarantees. The decision-relevance constraint provides both theoretical guarantees and practical performance improvements by aligning human interpretability with algorithmic optimality.

Key Details
Mathematical Foundation:
- Abstraction Error: ε(g_c) = max_{s,s': g(s)=g(s')} max_a |Q*(s,a) - Q*(s',a)|
- Performance bound: V^π*(s) - V^π_c*(s) ≤ 2ε(g_c)/(1-γ)² where γ is discount factor
- Optimal concept selection minimizes abstraction error while ensuring states with same concept representation share optimal actions
- Q-Distance metric measures difference in action-values between states to identify critical concept distinctions
Decision-Relevant Selection (DRS) Algorithm:
- First automated algorithm for selecting human-interpretable concepts for Concept-Based Models
- Formulated as Mixed Integer Linear Programming problem with O(n_d² + K) variables
- n_d = distinct abstract states, K = total available concepts
- Problem is NP-hard but tractable due to environmental constraints limiting effective state space
- DRS-log variant handles imperfect concept predictors using probabilistic separation constraints
Empirical Performance:
- DRS automatically recovers manually curated concept sets while matching/exceeding performance
- 159% improvement over baselines demonstrated in CartPole environment
- 40-87% improvement in Test-Time Intervention effectiveness across environments
- Comprehensive validation across CartPole, MiniGrid, Pong, Boxing, and real-world glucose management tasks
- Outperforms random, variance, and greedy baselines in 4/5 environments with perfect predictors

Implementation Considerations:
- Requires pre-computed Q-values or policy for optimization objective
- Scales with number of distinct abstract states, not total state space size
- Can incorporate concept prediction uncertainty through probabilistic DRS-log formulation
- Training curves show concept accuracy affects learning efficiency while concept number affects maximum achievable performance

Relationships
- State Abstraction — Provides theoretical framework for measuring concept quality and defining abstraction functions that preserve decision-relevant information
- Abstraction Error — Key metric for evaluating decision-relevance of concept sets and primary optimization objective in DRS algorithm
- Concept-Based Models — Primary application domain where decision-relevant concepts enable interpretable RL policies with performance guarantees
- Test-Time Intervention — Benefits significantly from well-selected decision-relevant concepts, enabling effective human oversight through meaningful concept corrections
- Q-Learning — Optimal Q-values define the decision-relevance constraint and provide foundation for performance bounds
- Reinforcement Learning — Core domain where decision-relevant concepts ensure interpretable models maintain optimal performance
- Mixed Integer Linear Programming — Optimization framework for automated concept selection in DRS algorithm
- Feature Selection — Decision-relevance provides principled criterion beyond correlation-based selection methods
- Interpretable Reinforcement Learning — Broader field where decision-relevant concepts provide theoretically-grounded approach to interpretability
- Markov Decision Processes — Underlying mathematical framework where decision-relevance constraint applies to state-action value functions
- Policy Optimization — Benefits from decision-relevant concepts that preserve optimal action selection while maintaining interpretability
- Human-AI Interaction — Enables effective human oversight through meaningful concept interventions during policy deployment
- Concept Bottleneck Models — Architecture that benefits from decision-relevant concept selection for interpretable intermediate representations
- Interpretable Machine Learning — Related field where decision-relevance provides performance-preserving interpretability constraints
Sources
- sources/selecting-decision-relevant-concepts-in-reinforcement-learning — Introduced DRS algorithm with theoretical foundations, performance bounds, empirical validation across multiple environments, test-time intervention analysis, and comprehensive comparison with baseline methods