Online-Mind2Web

Summary: A dataset consisting of 52 records with human annotations used for evaluating the performance of web agents. The dataset serves as a benchmark for comparing different web interaction approaches, particularly DOM-based versus GUI-based methods.

Overview

Online-Mind2Web is an evaluation dataset specifically designed to assess web agent performance across various web-based tasks. The dataset contains human-annotated records that provide ground truth for measuring how effectively different web interaction methods can accomplish real-world web navigation and interaction scenarios.

The dataset has been utilized in research comparing DOM Snapshots versus GUI Snapshots for LLM-Based Interaction, where it served as the testing ground for novel approaches like DOM Downsampling algorithms. Each record in the dataset represents a web interaction task with corresponding human annotations that establish the expected outcomes and successful completion criteria.

Key Details

Dataset Size: 52 annotated records
Purpose: Benchmarking web agent performance across different interaction modalities
Annotation Type: Human-generated ground truth labels
Primary Use Case: Evaluating Web Agents that use large language models as backends
Evaluation Metrics: Success rates measured against human annotations
Research Application: Comparative analysis of snapshot techniques, with results showing DOM-based methods achieving 67-73% success rates versus 65% for GUI baselines

The dataset enables researchers to test various approaches to web interaction, including traditional screenshot-based methods with visual grounding cues and newer DOM-based approaches that leverage structured HTML representations.

Relationships

DOM Downsampling — Evaluated using this dataset to measure algorithm effectiveness
Web Agents — Performance benchmarked against human annotations in this dataset
GUI Snapshots — Baseline approach tested and compared using these records
LLM-Based Interaction — Web agent architectures evaluated through dataset tasks
D2Snap Algorithm — Specific downsampling technique validated using dataset metrics
Element Extraction — Alternative DOM processing methods compared against dataset ground truth

Sources

sources/beyond-pixels-exploring-dom-downsampling-for-llm-based-web-agents — Primary research utilizing this dataset for web agent evaluation