arXiv 2602.17664

Sink-Aware Pruning for Diffusion Language Models

By Aidar Myrzakhan, Tianyi Li, et al.

Published 2026-02-19

Citation lineage

Review the prior work and downstream research connected to this paper.

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full g…

View the original paper on arXiv