arXiv 2504.13837

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

By Yang Yue, Zhiqi Chen, et al.

Published 2025-04-18

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated notable success in enhancing the reasoning performance of large language models (LLMs), particularly on mathematics and programming tasks. Similar to how traditional RL helps agents explore and learn new strategies, RLVR is believed to enable LLMs to continuously self-improve, thus acquiring novel reasoning abilities beyond those of the…

View the original paper on arXiv