arXiv 2510.13786
The Art of Scaling Reinforcement Learning Compute for LLMs
By Devvrit Khatri, Lovish Madaan, et al.
Published 2025-10-15
Citation lineage
Review the prior work and downstream research connected to this paper.
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000…