arXiv 2502.08524

LLM Pretraining with Continuous Concepts

By Jihoon Tack, Jack Lanchantin, et al.

Published 2025-02-12

Citation lineage

Review the prior work and downstream research connected to this paper.

Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…

View the original paper on arXiv