Concept Selection
Summary: Automated process of choosing an optimal subset of interpretable concepts from a candidate bank to maximize decision-making performance while maintaining interpretability. Originally developed for reinforcement learning through the Decision-Relevant Selection algorithm, which minimizes Abstraction Error to preserve decision structure.
Overview
Concept Selection addresses a fundamental challenge in Interpretable Machine Learning: how to automatically identify the most relevant concepts from a large pool of candidates without requiring manual curation or domain expertise. The problem is computationally hard (NP-complete) but critical for deploying interpretable models at scale.
The core insight is that concepts should be selected based on their decision-relevance - their ability to distinguish between states or inputs that require different optimal actions or decisions. This contrasts with traditional Feature Selection approaches that focus on predictive accuracy rather than decision quality.
In the reinforcement learning context, Concept Selection works by evaluating how well different concept subsets preserve the underlying decision structure of the original state space. The Decision-Relevant Selection (DRS) algorithm accomplishes this by:
- Computing Q-Distance between states to measure decision similarity
- Selecting concepts that minimize Abstraction Error when states are grouped by concept values
- Using Mixed Integer Linear Programming for tractable approximation of the NP-hard optimization
Key Details
- Performance Guarantees: Provides theoretical bounds showing that policies using decision-relevant concepts achieve near-optimal performance compared to policies with full state information
- Empirical Results: Demonstrates 40-87% improvement in Test-Time Intervention effectiveness across CartPole, MiniGrid, Pong, Boxing, and glucose management tasks
- Concept Recovery: Can automatically recover manually curated concept sets while matching or exceeding their performance on CUB dataset
- Computational Complexity: The general concept selection problem is NP-hard, but DRS provides polynomial-time approximation algorithms
- Scalability: Works with concept banks containing hundreds of candidate concepts, automatically selecting 5-15 most decision-relevant ones
The approach fundamentally changes how interpretable models are built - moving from manual concept engineering to automated, principled selection with performance guarantees.
Relationships
- Decision-Relevant Selection — primary algorithm implementing concept selection for RL
- State Abstraction — theoretical foundation measuring how concept groupings preserve decision structure
- Concept-Based Models — interpretable architecture that concept selection optimizes for
- Abstraction Error — optimization objective measuring decision-preservation quality
- Test-Time Intervention — downstream application where good concept selection improves human correction effectiveness
- Feature Selection — related but distinct problem focusing on prediction rather than decision quality
- Interpretable Reinforcement Learning — broader field where concept selection enables automated interpretability
- Q-Distance — metric for measuring decision similarity between states
Sources
- sources/selecting-decision-relevant-concepts-in-reinforcement-learning — introduced automated concept selection for RL, DRS algorithm, theoretical foundations, and empirical validation