arXiv 2510.07364

Base Models Know How to Reason, Thinking Models Learn When

By Constantin Venhoff, Iván Arcuschin, et al.

Published 2025-10-08

Citation lineage

Review the prior work and downstream research connected to this paper.

Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities or repurpose pre-existing base model ones. In this work, we propose a hybrid model where we activate reasoning mechanisms in base models at the right time to elicit thinking-model-level reasoning chains,…

View the original paper on arXiv