arXiv 2502.06768

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

By Jaeyeon Kim, Kulin Shah, et al.

Published 2025-02-10

Discussion

Read the public discussion and references gathered around this paper.

In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essential…

View the original paper on arXiv