arXiv 2512.16649

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

By Bingxiang He, Zekai Qu, et al.

Published 2025-12-18

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Recent advances in reinforcement learning for large language models have converged on increasing complexity: multi-stage training pipelines, dynamic hyperparameter schedules, and curriculum learning strategies. This raises a fundamental question: Is this complexity necessary? We present JustRL , a minimal approach using single-stage training with fixed hyperparameters that achieves state-of-the-art performance on tw…

View the original paper on arXiv