arXiv 2502.09992
Large Language Diffusion Models
By Shen Nie, Fengqi Zhu, et al.
Published 2025-02-14
Citation lineage
Review the prior work and downstream research connected to this paper.
The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…