source: "raw/articles/code2world-a-gui-world-model-via-renderable-code-generation.md"

Summary: Code2World: A GUI World Model via Renderable Code Generation

TL;DR: Code2World is a vision-language model that predicts next GUI states by generating renderable HTML code instead of pixels, achieving high visual fidelity while enabling fine-grained structural control for autonomous GUI agents.

Key Points

  • Code2World generates renderable HTML code to simulate next visual states rather than using pixel-based or text-based approaches
  • AndroidCode dataset contains over 80K high-quality screen-action pairs created by translating GUI trajectories into HTML
  • Uses visual-feedback revision loop to refine synthesized code, ensuring SigLIP score > 0.9 for strict alignment
  • Two-stage training: SFT cold start followed by Render-Aware Reinforcement Learning (RARL) with dual rewards
  • RARL uses Group Relative Policy Optimization (GRPO) with visual semantic and action consistency rewards
  • Code2World-8B rivals GPT-5 and Gemini-3-Pro-Image performance on next UI prediction
  • Enhances downstream navigation by +9.5% success rate boost for Gemini-2.5-Flash on AndroidWorld
  • Implements "Propose, Simulate, Select" pipeline for GUI agent enhancement
  • Evaluation on Android Control (ID) and GUI Odyssey (OOD) benchmarks shows superior generalization

Concepts Covered

Images and Figures

  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-0.png] — Project icon
  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-1.png] — Framework illustration showing input GUI + action → renderable code → predicted screenshot
  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-2.png] — Data synthesis pipeline and two-stage model optimization methodology
  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-3.png] — "Propose, Simulate, Select" pipeline for GUI agent enhancement
  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-4.png] — Quantitative comparison table across benchmarks
  • ![code2world-a-gui-world-model-via-renderable-code-generation/img-5.png] through ![code2world-a-gui-world-model-via-renderable-code-generation/img-8.png] — Qualitative comparison examples showing email app launch, news app navigation, reminder completion, and e-commerce filtering

Related Concepts