LLM Ground Truth
Summary: GPT-4o-generated semantic ratings used to evaluate DOM elements and attributes for their importance in web interface understanding. These ground truth labels provide quantitative measures of UI feature semantics to guide DOM downsampling algorithms.
Overview
LLM Ground Truth refers to the semantic importance ratings generated by GPT-4o to evaluate HTML elements and attributes based on their significance for web interface understanding. This ground truth data serves as the foundation for the D2Snap algorithm's decision-making process when determining which DOM components to preserve or remove during downsampling.
The ground truth system addresses a critical challenge in DOM Downsampling — how to quantitatively assess the semantic value of different HTML elements and attributes for LLM Web Agents. Rather than relying on heuristic rules, the approach leverages GPT-4o's understanding of web interfaces to provide consistent, semantic-based evaluations.
Key Details
- Generation Method: GPT-4o produces numerical ratings for DOM elements and attributes based on their UI semantic importance
- Application: Used as training signal for D2Snap algorithm to learn optimal downsampling strategies
- UI Feature Categories: Evaluates elements across multiple dimensions including hierarchy, interactivity, content value, and structural significance
- Validation: Ground truth ratings correlate with empirical performance — elements rated as more important by GPT-4o contribute more to task success when preserved
- Hierarchy Importance: Analysis reveals hierarchy as the most critical UI feature, with removal causing the largest performance degradation
- Element Classification: Supports the four-category taxonomy: container, content, interactive, and other elements
- Consistency: Provides standardized evaluation criteria across diverse web interfaces and DOM structures
Relationships
- D2Snap — uses LLM ground truth ratings to guide element consolidation decisions
- DOM Downsampling — provides the semantic foundation for size reduction algorithms
- UI Feature Semantics — represents the specific domain of interface understanding being quantified
- Element Classification — ground truth ratings inform the categorization of HTML elements by function
- LLM Web Agents — enables more effective state representation through semantically-aware DOM processing
- Web Agent Snapshots — supports the transition from pixel-based to DOM-based agent representations
Sources
- sources/beyond-pixels-exploring-dom-downsampling-for-llm-based-web-agents — introduced the concept and demonstrated its application in DOM downsampling research