arXiv 2511.03276

Diffusion Language Models are Super Data Learners

By Jinjie Ni, Qian Liu, et al.

Published 2025-11-05

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding factors: (1) any-order…

View the original paper on arXiv