Concept Selection

Summary: Automated process of choosing an optimal subset of interpretable concepts from a candidate bank to maximize decision-making performance while maintaining interpretability. Originally developed for reinforcement learning through the Decision-Relevant Selection algorithm, which minimizes Abstraction Error to preserve decision structure.

Overview

Concept Selection addresses a fundamental challenge in Interpretable Machine Learning: how to automatically identify the most relevant concepts from a large pool of candidates without requiring manual curation or domain expertise. The problem is computationally hard (NP-complete) but critical for deploying interpretable models at scale.

The core insight is that concepts should be selected based on their decision-relevance - their ability to distinguish between states or inputs that require different optimal actions or decisions. This contrasts with traditional Feature Selection approaches that focus on predictive accuracy rather than decision quality.

In the reinforcement learning context, Concept Selection works by evaluating how well different concept subsets preserve the underlying decision structure of the original state space. The Decision-Relevant Selection (DRS) algorithm accomplishes this by:

Computing Q-Distance between states to measure decision similarity
Selecting concepts that minimize Abstraction Error when states are grouped by concept values
Using Mixed Integer Linear Programming for tractable approximation of the NP-hard optimization

Key Details

Performance Guarantees: Provides theoretical bounds showing that policies using decision-relevant concepts achieve near-optimal performance compared to policies with full state information
Empirical Results: Demonstrates 40-87% improvement in Test-Time Intervention effectiveness across CartPole, MiniGrid, Pong, Boxing, and glucose management tasks
Concept Recovery: Can automatically recover manually curated concept sets while matching or exceeding their performance on CUB dataset
Computational Complexity: The general concept selection problem is NP-hard, but DRS provides polynomial-time approximation algorithms
Scalability: Works with concept banks containing hundreds of candidate concepts, automatically selecting 5-15 most decision-relevant ones

The approach fundamentally changes how interpretable models are built - moving from manual concept engineering to automated, principled selection with performance guarantees.

Relationships

Decision-Relevant Selection — primary algorithm implementing concept selection for RL
State Abstraction — theoretical foundation measuring how concept groupings preserve decision structure
Concept-Based Models — interpretable architecture that concept selection optimizes for
Abstraction Error — optimization objective measuring decision-preservation quality
Test-Time Intervention — downstream application where good concept selection improves human correction effectiveness
Feature Selection — related but distinct problem focusing on prediction rather than decision quality
Interpretable Reinforcement Learning — broader field where concept selection enables automated interpretability
Q-Distance — metric for measuring decision similarity between states

Sources

sources/selecting-decision-relevant-concepts-in-reinforcement-learning — introduced automated concept selection for RL, DRS algorithm, theoretical foundations, and empirical validation