Q-Distance
Summary: A metric that measures the maximum difference between Q-values of state pairs across all actions, used to quantify how well concept-based abstractions preserve decision-relevant information in reinforcement learning. It serves as the formal measure of abstraction error that directly determines performance bounds for concept-based policies.
Overview
Q-Distance is a fundamental metric in State Abstraction theory that quantifies the abstraction error introduced when grouping states together under a concept-based representation. It measures the worst-case difference in optimal Q-values between any two states that share the same abstract representation across all possible actions.
The metric is mathematically expressed as:
Q-Distance = max_{s,s': g(s)=g(s')} max_a |Q*(s,a) - Q*(s',a)|
Where:
g(s)represents the concept-based abstraction functionsands'are states with identical abstract representationsQ*(s,a)is the optimal Q-value for statesand actiona
This measurement directly operationalizes the Decision-Relevant Concepts principle: if two states require different optimal actions but are grouped together by the abstraction, the Q-Distance will be large, indicating poor concept selection. Conversely, low Q-Distance indicates that states grouped together have similar decision consequences.
Key Details
- Performance Bound: Q-Distance directly determines the performance bound for concept-based policies:
V^π*(s) - V^π_c*(s) ≤ 2ε(g_c)/(1-γ)², whereε(g_c)is the abstraction error measured by Q-Distance andγis the discount factor - Optimization Target: The Decision-Relevant Selection (DRS) algorithm minimizes Q-Distance through Mixed Integer Linear Programming to automatically select optimal concept subsets from candidate banks
- Computational Complexity: The optimization problem involves O(n_d² + K) variables, where n_d is the number of distinct abstract states and K is the number of candidate concepts
- Perfect vs Imperfect Settings: Q-Distance can be computed exactly when perfect concept predictors are available, or approximated using probabilistic constraints when concept predictions have uncertainty
- Empirical Impact: Minimizing Q-Distance leads to significant performance improvements across environments - 159% improvement in CartPole, with consistent gains in MiniGrid, Atari games, and real-world glucose management tasks
- NP-Hard Problem: While the general concept selection problem minimizing Q-Distance is NP-hard, tractable approximation algorithms exist for practical implementation
Relationships
- Abstraction Error — Q-Distance is the formal measure of abstraction error in concept-based reinforcement learning
- State Abstraction — Provides the theoretical foundation for why Q-Distance matters for preserving decision structure and policy performance
- Decision-Relevant Concepts — Q-Distance operationalizes the decision-relevance principle into a quantifiable metric for concept selection
- Test-Time Intervention — Lower Q-Distance improves the effectiveness of human corrections to concept predictions by 40-87% across environments
- Concept-Based Models — Q-Distance measures how well concept representations preserve the decision-making quality of the original state space
- Q-Learning — Q-Distance depends on optimal Q-values, connecting concept selection to value function approximation quality
- Mixed Integer Linear Programming — The optimization framework used to minimize Q-Distance in the DRS algorithm for automated concept selection
- Interpretable Reinforcement Learning — Q-Distance provides theoretical guarantees for maintaining performance while achieving interpretability through concept-based abstractions
Sources
- sources/selecting-decision-relevant-concepts-in-reinforcement-learning — Introduced Q-Distance as the core metric for measuring abstraction error in concept-based RL, developed the theoretical framework connecting it to performance bounds, and demonstrated its use in the DRS algorithm for automated concept selection