UI Feature Classification

Summary: A GPT-4o derived taxonomy that rates HTML elements and their attributes by semantic importance for web automation tasks. This classification system distinguishes between container, content, interactive, and other element types to guide DOM processing algorithms.

Overview

UI Feature Classification is a semantic taxonomy developed to systematically categorize HTML elements based on their functional role and importance in web user interfaces. The classification was derived from GPT-4o's analysis of web elements and serves as ground truth for training and evaluating DOM Downsampling algorithms used by Web Agents.

The taxonomy recognizes that different HTML elements serve distinct purposes in web applications - some provide structure (containers), others present information (content), some enable user interaction (interactive), while others serve auxiliary functions. This semantic understanding allows automated systems to make informed decisions about which elements to preserve, modify, or remove during DOM processing.

Key Details

Element Categories:

  • Container Elements: Structural elements that organize layout and hierarchy (divs, sections, headers)
  • Content Elements: Information-bearing elements that display text, media, or data to users
  • Interactive Elements: User interface controls that enable interaction (buttons, forms, links, inputs)
  • Other Elements: Auxiliary elements with specialized or minor roles

Semantic Importance Ratings:

  • Elements rated on semantic significance for task completion
  • Higher ratings indicate greater importance for preserving element functionality
  • Ratings inform downsampling decisions in algorithms like D2Snap

Derivation Process:

  • Ground truth classifications generated by GPT-4o analysis
  • Based on element roles in typical web automation scenarios
  • Considers both element type and contextual importance

Applications:

Performance Impact:

  • Hierarchy preservation identified as most valuable UI feature for LLMs
  • Proper classification enables 67-73% success rates in web automation tasks
  • Container element handling crucial for maintaining DOM structure

Relationships

  • DOM Downsampling — uses classification to guide element selection and processing strategies
  • Web Agents — rely on classification for understanding which page elements matter for task completion
  • Element Extraction — applies classification to filter relevant elements while preserving important features
  • D2Snap — implements hierarchical processing based on container/content/interactive distinctions
  • LLM Context Windows — classification helps optimize token usage by prioritizing semantically important elements
  • CSS Selectors — semantic understanding improves programmatic element targeting accuracy
  • Accessibility Trees — related approach to semantic element classification for assistive technologies

Sources