arXiv 2502.06768
Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions
By Jaeyeon Kim, Kulin Shah, et al.
Published 2025-02-10
Discussion
Read the public discussion and references gathered around this paper.
In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essential…