source: "raw/articles/selecting-decision-relevant-concepts-in-reinforcement-learning.md"

Selecting Decision-Relevant Concepts in Reinforcement Learning

TL;DR: Proposes the first automated algorithms for selecting interpretable concepts in reinforcement learning by viewing concept selection through state abstraction theory and minimizing abstraction error to preserve decision-making performance.

Key Points

Manual concept selection for interpretable RL is costly, requires domain expertise, and provides no performance guarantees
Introduces Decision-Relevant Selection (DRS) algorithm that automatically selects concepts by minimizing state abstraction error
Key insight: concepts are decision-relevant if removing them would cause agents to confuse states requiring different actions
Provides performance bounds showing that concept-based policies using decision-relevant concepts achieve near-optimal performance
Shows decision-relevant concepts improve test-time intervention effectiveness by 40-87% across environments
Empirically validates on CartPole, MiniGrid, Pong, Boxing, and real-world glucose management tasks
DRS can automatically recover manually curated concept sets while matching or exceeding their performance
Proves concept selection problem is NP-hard but provides tractable approximation algorithms

Concepts Covered

Concept-Based Models — core interpretable RL framework mapping states to human-understandable concepts then to actions
State Abstraction — theoretical foundation for grouping states with similar decision consequences
Decision-Relevant Concepts — concepts that distinguish states requiring different optimal actions
Abstraction Error — measure of how well concept representations preserve decision structure
Test-Time Intervention — human correction of concept predictions during deployment
Interpretable Reinforcement Learning — RL methods that provide human-understandable decision processes
Concept Selection — automated process of choosing optimal subset of concepts from candidate bank
Q-Distance — metric measuring difference in action-values between states

Images and Figures

2604.04808v1/x1.png — Shows standard manual concept selection pipeline with iterative refinement
2604.04808v1/x2.png — Illustrates concept-based model architecture and decision-relevance principle
2604.04808v1/x3.png and 2604.04808v1/x4.png — Performance comparison of DRS vs baselines with perfect/imperfect concept predictors
2604.04808v1/x5.png — Training efficiency vs concept accuracy and number of concepts in MiniGrid
2604.04808v1/x6.png — Impact of concept accuracy vs number on performance in CartPole
2604.04808v1/x7.png — Test-time intervention effectiveness across environments
2604.04808v1/x8.png — CUB dataset results showing DRS matches manual concept selection

source: "raw/articles/selecting-decision-relevant-concepts-in-reinforcement-learning.md"

Selecting Decision-Relevant Concepts in Reinforcement Learning

Key Points

Concepts Covered

Images and Figures

Related Concepts