arXiv 2601.18778
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
By Shobhita Sundaram, John Quan, et al.
Published 2026-01-26
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework…