arXiv 2502.06768
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
By Jaeyeon Kim, Kulin Shah, et al.
Published 2025-02-10
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essential…