arXiv 2602.17664

Sink-Aware Pruning for Diffusion Language Models

By Aidar Myrzakhan, Tianyi Li, et al.

Published 2026-02-19

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full g…

View the original paper on arXiv