arXiv 2502.08524

LLM Pretraining with Continuous Concepts

By Jihoon Tack, Jack Lanchantin, et al.

Published 2025-02-12

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Next token prediction has been the standard training objective used in large language model pretraining. Representations are learned as a result of optimizing for token-level perplexity. We propose Continuous Concept Mixing (CoCoMix), a novel pretraining framework that combines discrete next token prediction with continuous concepts. Specifically, CoCoMix predicts continuous concepts learned from a pretrained sparse…

View the original paper on arXiv