arXiv 2502.08524
LLM Pretraining with Continuous Concepts
By Jihoon Tack, Jack Lanchantin, et al.
Published 2025-02-12
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…