arXiv 2502.09992

Large Language Diffusion Models

By Shen Nie, Fengqi Zhu, et al.

Published 2025-02-14

Citation lineage

Review the prior work and downstream research connected to this paper.

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…

View the original paper on arXiv