arXiv 2502.06768

Train for the Worst, Plan for the Best: Understanding Token Ordering in Masked Diffusions

By Jaeyeon Kim, Kulin Shah, et al.

Published 2025-02-10

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

In recent years, masked diffusion models (MDMs) have emerged as a promising alternative approach for generative modeling over discrete domains. Compared to autoregressive models (ARMs), MDMs trade off complexity at training time with flexibility at inference time. At training time, they must learn to solve an exponentially large number of infilling problems, but at inference time, they can decode tokens in essential…

View the original paper on arXiv