Q-Distance

Summary: A metric that measures the maximum difference between Q-values of state pairs across all actions, used to quantify how well concept-based abstractions preserve decision-relevant information in reinforcement learning. It serves as the formal measure of abstraction error that directly determines performance bounds for concept-based policies.

Overview

Q-Distance is a fundamental metric in State Abstraction theory that quantifies the abstraction error introduced when grouping states together under a concept-based representation. It measures the worst-case difference in optimal Q-values between any two states that share the same abstract representation across all possible actions.

The metric is mathematically expressed as:

Q-Distance = max_{s,s': g(s)=g(s')} max_a |Q*(s,a) - Q*(s',a)|

Where:

g(s) represents the concept-based abstraction function
s and s' are states with identical abstract representations
Q*(s,a) is the optimal Q-value for state s and action a

This measurement directly operationalizes the Decision-Relevant Concepts principle: if two states require different optimal actions but are grouped together by the abstraction, the Q-Distance will be large, indicating poor concept selection. Conversely, low Q-Distance indicates that states grouped together have similar decision consequences.

Key Details

Performance Bound: Q-Distance directly determines the performance bound for concept-based policies: V^π*(s) - V^π_c*(s) ≤ 2ε(g_c)/(1-γ)², where ε(g_c) is the abstraction error measured by Q-Distance and γ is the discount factor
Optimization Target: The Decision-Relevant Selection (DRS) algorithm minimizes Q-Distance through Mixed Integer Linear Programming to automatically select optimal concept subsets from candidate banks
Computational Complexity: The optimization problem involves O(n_d² + K) variables, where n_d is the number of distinct abstract states and K is the number of candidate concepts
Perfect vs Imperfect Settings: Q-Distance can be computed exactly when perfect concept predictors are available, or approximated using probabilistic constraints when concept predictions have uncertainty
Empirical Impact: Minimizing Q-Distance leads to significant performance improvements across environments - 159% improvement in CartPole, with consistent gains in MiniGrid, Atari games, and real-world glucose management tasks
NP-Hard Problem: While the general concept selection problem minimizing Q-Distance is NP-hard, tractable approximation algorithms exist for practical implementation

Relationships

Abstraction Error — Q-Distance is the formal measure of abstraction error in concept-based reinforcement learning
State Abstraction — Provides the theoretical foundation for why Q-Distance matters for preserving decision structure and policy performance
Decision-Relevant Concepts — Q-Distance operationalizes the decision-relevance principle into a quantifiable metric for concept selection
Test-Time Intervention — Lower Q-Distance improves the effectiveness of human corrections to concept predictions by 40-87% across environments
Concept-Based Models — Q-Distance measures how well concept representations preserve the decision-making quality of the original state space
Q-Learning — Q-Distance depends on optimal Q-values, connecting concept selection to value function approximation quality
Mixed Integer Linear Programming — The optimization framework used to minimize Q-Distance in the DRS algorithm for automated concept selection
Interpretable Reinforcement Learning — Q-Distance provides theoretical guarantees for maintaining performance while achieving interpretability through concept-based abstractions

Sources

sources/selecting-decision-relevant-concepts-in-reinforcement-learning — Introduced Q-Distance as the core metric for measuring abstraction error in concept-based RL, developed the theoretical framework connecting it to performance bounds, and demonstrated its use in the DRS algorithm for automated concept selection