arXiv 2502.08524

LLM Pretraining with Continuous Concepts

By Jihoon Tack, Jack Lanchantin, et al.

Published 2025-02-12

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…

View the original paper on arXiv