arXiv 2601.18778

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

By Shobhita Sundaram, John Quan, et al.

Published 2026-01-26

Citation lineage

Review the prior work and downstream research connected to this paper.

Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework…

View the original paper on arXiv