arXiv 2502.09992
Large Language Diffusion Models
By Shen Nie, Fengqi Zhu, et al.
Published 2025-02-14
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…