← Library
source: "raw/articles/selecting-decision-relevant-concepts-in-reinforcement-learning.md"
Selecting Decision-Relevant Concepts in Reinforcement Learning
TL;DR: Proposes the first automated algorithms for selecting interpretable concepts in reinforcement learning by viewing concept selection through state abstraction theory and minimizing abstraction error to preserve decision-making performance.
Key Points
- Manual concept selection for interpretable RL is costly, requires domain expertise, and provides no performance guarantees
- Introduces Decision-Relevant Selection (DRS) algorithm that automatically selects concepts by minimizing state abstraction error
- Key insight: concepts are decision-relevant if removing them would cause agents to confuse states requiring different actions
- Provides performance bounds showing that concept-based policies using decision-relevant concepts achieve near-optimal performance
- Shows decision-relevant concepts improve test-time intervention effectiveness by 40-87% across environments
- Empirically validates on CartPole, MiniGrid, Pong, Boxing, and real-world glucose management tasks
- DRS can automatically recover manually curated concept sets while matching or exceeding their performance
- Proves concept selection problem is NP-hard but provides tractable approximation algorithms
Concepts Covered
- Concept-Based Models — core interpretable RL framework mapping states to human-understandable concepts then to actions
- State Abstraction — theoretical foundation for grouping states with similar decision consequences
- Decision-Relevant Concepts — concepts that distinguish states requiring different optimal actions
- Abstraction Error — measure of how well concept representations preserve decision structure
- Test-Time Intervention — human correction of concept predictions during deployment
- Interpretable Reinforcement Learning — RL methods that provide human-understandable decision processes
- Concept Selection — automated process of choosing optimal subset of concepts from candidate bank
- Q-Distance — metric measuring difference in action-values between states
Images and Figures
2604.04808v1/x1.png— Shows standard manual concept selection pipeline with iterative refinement2604.04808v1/x2.png— Illustrates concept-based model architecture and decision-relevance principle2604.04808v1/x3.pngand2604.04808v1/x4.png— Performance comparison of DRS vs baselines with perfect/imperfect concept predictors2604.04808v1/x5.png— Training efficiency vs concept accuracy and number of concepts in MiniGrid2604.04808v1/x6.png— Impact of concept accuracy vs number on performance in CartPole2604.04808v1/x7.png— Test-time intervention effectiveness across environments2604.04808v1/x8.png— CUB dataset results showing DRS matches manual concept selection