source: "raw/articles/halluminate-rl-environments-for-financial-services.md"

Summary: Westworld - Simulated Web Environments for Agent Evaluation

TL;DR: Halluminate and Yutori built Westworld, a suite of 5 realistic web simulators with 100 tasks for evaluating and training web agents, achieving more reproducible benchmarking than real websites while enabling RL training.

Key Points

  • Real website evaluation suffers from CAPTCHAs, authentication blocks, changing data, and UI drift that create evaluation noise
  • Westworld includes 5 simulators: Noodle Flights, Travelpedia (travel), GoodBuy, Azora, Megamart (ecommerce)
  • Uses task-centric simulation approach rather than app-centric - focuses on core workflows instead of rebuilding entire sites
  • Employs three types of verifiable rewards: state-based unit tests, component-level verification, real-time ground truth calculation
  • Performance results show Yutori n1 (trained on Westworld) achieved 86% average vs 67.7% for Claude Sonnet 4.5
  • Common failure modes include UI grounding (calendar date picking), reasoning errors on multi-step tasks, unfamiliarity with site-specific navigation patterns
  • Simulators require significant upfront engineering and ongoing maintenance as real sites evolve

Concepts Covered

Images and Figures

  • halluminate-rl-environments-for-financial-services/img-0.png - Noodle Flights interface
  • halluminate-rl-environments-for-financial-services/img-1.png - Noodle Flights simulator
  • halluminate-rl-environments-for-financial-services/img-2.png - Travelpedia simulator
  • halluminate-rl-environments-for-financial-services/img-3.png - GoodBuy simulator
  • halluminate-rl-environments-for-financial-services/img-4.png - Azora simulator
  • halluminate-rl-environments-for-financial-services/img-5.png - Megamart simulator
  • Error analysis examples showing calendar date picking failures, flight selection reasoning errors, and site navigation challenges
  • Japan Airlines redesign comparison showing maintenance challenges for simulators

Related Concepts