arXiv 2502.09992
Large Language Diffusion Models
By Shen Nie, Fengqi Zhu, et al.
Published 2025-02-14
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…