arXiv 2510.13786

The Art of Scaling Reinforcement Learning Compute for LLMs

By Devvrit Khatri, Lovish Madaan, et al.

Published 2025-10-15

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Reinforcement learning (RL) has become central to training large language models (LLMs), yet the field lacks predictive scaling methodologies comparable to those established for pre-training. Despite rapidly rising compute budgets, there is no principled understanding of how to evaluate algorithmic improvements for scaling RL compute. We present the first large-scale systematic study, amounting to more than 400,000…

View the original paper on arXiv