source: "raw/articles/selecting-decision-relevant-concepts-in-reinforcement-learning.md"

Selecting Decision-Relevant Concepts in Reinforcement Learning

TL;DR: Proposes the first automated algorithms for selecting interpretable concepts in reinforcement learning by viewing concept selection through state abstraction theory and minimizing abstraction error to preserve decision-making performance.

Key Points

  • Manual concept selection for interpretable RL is costly, requires domain expertise, and provides no performance guarantees
  • Introduces Decision-Relevant Selection (DRS) algorithm that automatically selects concepts by minimizing state abstraction error
  • Key insight: concepts are decision-relevant if removing them would cause agents to confuse states requiring different actions
  • Provides performance bounds showing that concept-based policies using decision-relevant concepts achieve near-optimal performance
  • Shows decision-relevant concepts improve test-time intervention effectiveness by 40-87% across environments
  • Empirically validates on CartPole, MiniGrid, Pong, Boxing, and real-world glucose management tasks
  • DRS can automatically recover manually curated concept sets while matching or exceeding their performance
  • Proves concept selection problem is NP-hard but provides tractable approximation algorithms

Concepts Covered

Images and Figures

  • 2604.04808v1/x1.png — Shows standard manual concept selection pipeline with iterative refinement
  • 2604.04808v1/x2.png — Illustrates concept-based model architecture and decision-relevance principle
  • 2604.04808v1/x3.png and 2604.04808v1/x4.png — Performance comparison of DRS vs baselines with perfect/imperfect concept predictors
  • 2604.04808v1/x5.png — Training efficiency vs concept accuracy and number of concepts in MiniGrid
  • 2604.04808v1/x6.png — Impact of concept accuracy vs number on performance in CartPole
  • 2604.04808v1/x7.png — Test-time intervention effectiveness across environments
  • 2604.04808v1/x8.png — CUB dataset results showing DRS matches manual concept selection

Related Concepts