arXiv 2502.06768

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

By Jaeyeon Kim, Kulin Shah, et al.

Published 2025-02-10

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essential…

View the original paper on arXiv