arXiv 2602.17664
Sink-Aware Pruning for Diffusion Language Models
By Aidar Myrzakhan, Tianyi Li, et al.
Published 2026-02-19
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full gā¦