arXiv 2510.13786

The Art of Scaling Reinforcement Learning Compute for LLMs

By Devvrit Khatri, Lovish Madaan, et al.

Published 2025-10-15

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000…

View the original paper on arXiv