arXiv 2502.08524
LLM Pretraining with Continuous Concepts
By Jihoon Tack, Jack Lanchantin, et al.
Published 2025-02-12
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…