arXiv 2510.13786
The Art of Scaling Reinforcement Learning Compute for LLMs
By Devvrit Khatri, Lovish Madaan, et al.
Published 2025-10-15
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000…