arXiv 2502.08524
LLM Pretraining with Continuous Concepts
By Jihoon Tack, Jack Lanchantin, et al.
Published 2025-02-12
Citation lineage
Review the prior work and downstream research connected to this paper.
Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…