Online-Mind2Web
Summary: A dataset consisting of 52 records with human annotations used for evaluating the performance of web agents. The dataset serves as a benchmark for comparing different web interaction approaches, particularly DOM-based versus GUI-based methods.
Overview
Online-Mind2Web is an evaluation dataset specifically designed to assess web agent performance across various web-based tasks. The dataset contains human-annotated records that provide ground truth for measuring how effectively different web interaction methods can accomplish real-world web navigation and interaction scenarios.
The dataset has been utilized in research comparing DOM Snapshots versus GUI Snapshots for LLM-Based Interaction, where it served as the testing ground for novel approaches like DOM Downsampling algorithms. Each record in the dataset represents a web interaction task with corresponding human annotations that establish the expected outcomes and successful completion criteria.
Key Details
- Dataset Size: 52 annotated records
- Purpose: Benchmarking web agent performance across different interaction modalities
- Annotation Type: Human-generated ground truth labels
- Primary Use Case: Evaluating Web Agents that use large language models as backends
- Evaluation Metrics: Success rates measured against human annotations
- Research Application: Comparative analysis of snapshot techniques, with results showing DOM-based methods achieving 67-73% success rates versus 65% for GUI baselines
The dataset enables researchers to test various approaches to web interaction, including traditional screenshot-based methods with visual grounding cues and newer DOM-based approaches that leverage structured HTML representations.
Relationships
- DOM Downsampling — Evaluated using this dataset to measure algorithm effectiveness
- Web Agents — Performance benchmarked against human annotations in this dataset
- GUI Snapshots — Baseline approach tested and compared using these records
- LLM-Based Interaction — Web agent architectures evaluated through dataset tasks
- D2Snap Algorithm — Specific downsampling technique validated using dataset metrics
- Element Extraction — Alternative DOM processing methods compared against dataset ground truth
Sources
- sources/beyond-pixels-exploring-dom-downsampling-for-llm-based-web-agents — Primary research utilizing this dataset for web agent evaluation