arXiv 2509.23184
PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space
By Boyi Zeng, He Li, et al.
Published 2025-09-27
Discussion
Read the public discussion and references gathered around this paper.
The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps during pretraining to improve the generation of each individual token? To address this, we propose a novel pre-training methodology: Pretraining Language Models with Latent Thoughts (PonderLM-2). Our approach pretrains a la…