arXiv 2509.23184

PonderLM-2: Pretraining LLM with Latent Thoughts in Continuous Space

By Boyi Zeng, He Li, et al.

Published 2025-09-27

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

The remarkable success of Chain-of-Thought (CoT), which enhances performance by scaling generation steps at test-time, inspires us to ask: can we leverage a similar scaling of computational steps during pretraining to improve the generation of each individual token? To address this, we propose a novel pre-training methodology: Pretraining Language Models with Latent Thoughts (PonderLM-2). Our approach pretrains a la…

View the original paper on arXiv