arXiv 2602.17664

Sink-Aware Pruning for Diffusion Language Models

By Aidar Myrzakhan, Tianyi Li, et al.

Published 2026-02-19

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Diffusion Language Models (DLMs) incur high inference cost due to iterative denoising, motivating efficient pruning. Existing pruning heuristics largely inherited from autoregressive (AR) LLMs, typically preserve attention sink tokens because AR sinks serve as stable global anchors. We show that this assumption does not hold for DLMs: the attention-sink position exhibits substantially higher variance over the full g…

View the original paper on arXiv