arXiv 2510.07364

Base Models Know How to Reason, Thinking Models Learn When

By Constantin Venhoff, Iván Arcuschin, et al.

Published 2025-10-08

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities or repurpose pre-existing base model ones. In this work, we propose a hybrid model where we activate reasoning mechanisms in base models at the right time to elicit thinking-model-level reasoning chains,…

View the original paper on arXiv