arXiv 2512.16649

JustRL: Scaling a 1.5B LLM with a Simple RL Recipe

By Bingxiang He, Zekai Qu, et al.

Published 2025-12-18

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Recent advances in reinforcement learning for large language models have converged on increasing complexity: multi-stage training pipelines, dynamic hyperparameter schedules, and curriculum learning strategies. This raises a fundamental question: Is this complexity necessary? We present JustRL , a minimal approach using single-stage training with fixed hyperparameters that achieves state-of-the-art performance on tw…

View the original paper on arXiv