arXiv 2510.07364

Base Models Know How to Reason, Thinking Models Learn When

By Constantin Venhoff, Iván Arcuschin, et al.

Published 2025-10-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Why do thinking language models like DeepSeek R1 outperform their base counterparts? Despite consistent performance gains, it remains unclear to what extent thinking models learn entirely new reasoning capabilities or repurpose pre-existing base model ones. In this work, we propose a hybrid model where we activate reasoning mechanisms in base models at the right time to elicit thinking-model-level reasoning chains,…

View the original paper on arXiv