arXiv 2601.18778

Teaching Models to Teach Themselves: Reasoning at the Edge of Learnability

By Shobhita Sundaram, John Quan, et al.

Published 2026-01-26

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Can a model learn to escape its own learning plateau? Reinforcement learning methods for finetuning large reasoning models stall on datasets with low initial success rates, and thus little training signal. We investigate a fundamental question: Can a pretrained LLM leverage latent knowledge to generate an automated curriculum for problems it cannot solve? To explore this, we design SOAR: A self-improvement framework…

View the original paper on arXiv