arXiv 2502.09992

Large Language Diffusion Models

By Shen Nie, Fengqi Zhu, et al.

Published 2025-02-14

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

The capabilities of large language models (LLMs) are widely regarded as relying on autoregressive models (ARMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA employs a forward data masking process and a reverse generation process, parameterized by a Transformer to predict masked tokens. It provides a prin…

View the original paper on arXiv