Q-Distance

Summary: A metric that measures the maximum difference between Q-values of state pairs across all actions, used to quantify how well concept-based abstractions preserve decision-relevant information in reinforcement learning. It serves as the formal measure of abstraction error that directly determines performance bounds for concept-based policies.

Overview

Q-Distance is a fundamental metric in State Abstraction theory that quantifies the abstraction error introduced when grouping states together under a concept-based representation. It measures the worst-case difference in optimal Q-values between any two states that share the same abstract representation across all possible actions.

The metric is mathematically expressed as:

Q-Distance = max_{s,s': g(s)=g(s')} max_a |Q*(s,a) - Q*(s',a)|

Where:

  • g(s) represents the concept-based abstraction function
  • s and s' are states with identical abstract representations
  • Q*(s,a) is the optimal Q-value for state s and action a

This measurement directly operationalizes the Decision-Relevant Concepts principle: if two states require different optimal actions but are grouped together by the abstraction, the Q-Distance will be large, indicating poor concept selection. Conversely, low Q-Distance indicates that states grouped together have similar decision consequences.

Key Details

  • Performance Bound: Q-Distance directly determines the performance bound for concept-based policies: V^π*(s) - V^π_c*(s) ≤ 2ε(g_c)/(1-γ)², where ε(g_c) is the abstraction error measured by Q-Distance and γ is the discount factor
  • Optimization Target: The Decision-Relevant Selection (DRS) algorithm minimizes Q-Distance through Mixed Integer Linear Programming to automatically select optimal concept subsets from candidate banks
  • Computational Complexity: The optimization problem involves O(n_d² + K) variables, where n_d is the number of distinct abstract states and K is the number of candidate concepts
  • Perfect vs Imperfect Settings: Q-Distance can be computed exactly when perfect concept predictors are available, or approximated using probabilistic constraints when concept predictions have uncertainty
  • Empirical Impact: Minimizing Q-Distance leads to significant performance improvements across environments - 159% improvement in CartPole, with consistent gains in MiniGrid, Atari games, and real-world glucose management tasks
  • NP-Hard Problem: While the general concept selection problem minimizing Q-Distance is NP-hard, tractable approximation algorithms exist for practical implementation

Relationships

  • Abstraction Error — Q-Distance is the formal measure of abstraction error in concept-based reinforcement learning
  • State Abstraction — Provides the theoretical foundation for why Q-Distance matters for preserving decision structure and policy performance
  • Decision-Relevant Concepts — Q-Distance operationalizes the decision-relevance principle into a quantifiable metric for concept selection
  • Test-Time Intervention — Lower Q-Distance improves the effectiveness of human corrections to concept predictions by 40-87% across environments
  • Concept-Based Models — Q-Distance measures how well concept representations preserve the decision-making quality of the original state space
  • Q-Learning — Q-Distance depends on optimal Q-values, connecting concept selection to value function approximation quality
  • Mixed Integer Linear Programming — The optimization framework used to minimize Q-Distance in the DRS algorithm for automated concept selection
  • Interpretable Reinforcement Learning — Q-Distance provides theoretical guarantees for maintaining performance while achieving interpretability through concept-based abstractions

Sources