arXiv 2510.13786

The Art of Scaling Reinforcement Learning Compute for LLMs

By Devvrit Khatri, Lovish Madaan, et al.

Published 2025-10-15

Citation lineage

Review the prior work and downstream research connected to this paper.

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000…

View the original paper on arXiv