source: "raw/articles/frontier-rl-is-cheaper-than-you-think.md"

Summary: Frontier RL Is Cheaper Than You Think

TL;DR: Reinforcement learning at frontier scale is more affordable than assumed because weight updates between RL checkpoints are 98%+ sparse, enabling delta compression that reduces cross-region transfers by ~94% and makes distributed rollout fleets practical.

Key Points

Traditional mega-cluster approach assumes you must ship full 1TB checkpoints on every policy update, but this is unnecessary
Between consecutive RL checkpoints, >98% of weights remain bit-equivalent in bf16 format due to small learning rates and sparse RL signals
Delta compression reduces average transfer from 1024 GiB to ~20.3 GiB (1.98% of full model), cutting bandwidth by 94%
Asynchronous RL tolerates a few minutes of policy staleness in exchange for much better compute efficiency
Multi-region rollout capacity becomes usable when weight updates are small and routine rather than stop-the-world events
Fireworks supported Cursor's Composer 2 training across 3-4 clusters worldwide using this architecture
Approach works best for frontier-scale models where trainer and rollout can't fit on one compact cluster

Concepts Covered

Delta Compression — core technique enabling 98%+ reduction in checkpoint transfer sizes
Asynchronous RL — training pattern that trades slight policy staleness for compute efficiency
Multi-region Deployment — distributing rollout capacity across geographic regions and cloud providers
Policy Staleness — the acceptable lag between trainer updates and rollout fleet policy versions
Rollout Fleet Architecture — infrastructure pattern separating training from inference sampling
Weight Update Sparsity — empirical observation that most weights remain unchanged between RL steps

Images and Figures

Checkpoint Cadence diagram showing periodic full checkpoints with delta updates in between
Delta-Compressed Weight Updates flowchart illustrating the 3-step process: identify changed weights, package tensors, reconstruct and swap
Policy Freshness Timeline comparing async updates vs full restarts and their impact on serving gaps

source: "raw/articles/frontier-rl-is-cheaper-than-you-think.md"

Summary: Frontier RL Is Cheaper Than You Think

Key Points

Concepts Covered

Images and Figures

Related Concepts