arXiv 2511.03276
Diffusion Language Models are Super Data Learners
By Jinjie Ni, Qian Liu, et al.
Published 2025-11-05
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Under strictly controlled pre-training settings, we observe a Crossover: when unique data is limited, diffusion language models (DLMs) consistently surpass autoregressive (AR) models by training for more epochs. The crossover shifts later with more or higher-quality data, earlier with larger models, and persists across dense and sparse architectures. We attribute the gains to three compounding factors: (1) any-order…