arXiv 2601.18778
Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability
By Shobhita Sundaram, John Quan, et al.
Published 2026-01-26
Citation lineage
Review the prior work and downstream research connected to this paper.
Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework…