arXiv 2502.09992

Large Language Diffusion Models

By Shen Nie, Fengqi Zhu, et al.

Published 2025-02-14

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…

View the original paper on arXiv