arXiv 2509.23184
PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space
By Boyi Zeng, He Li, et al.
Published 2025-09-27
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps during pretraining to improve the generation of each individual token? To address this, we propose a novel pre-training methodology: Pretraining Language Models with Latent Thoughts (PonderLM-2). Our approach pretrains a la…