arXiv 2503.09512

Reinforcement Learning is all You Need

By Yongsheng Lian

Published 2025-03-12

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Inspired by the success of DeepSeek R1 in reasoning via reinforcement learning without human feedback, we train a 3B language model using the Countdown Game with pure reinforcement learning. Our model outperforms baselines on four of five benchmarks, demonstrating improved generalization beyond its training data. Notably, response length does not correlate with reasoning quality, and while "aha moments" emerge, they…

View the original paper on arXiv