arXiv 2507.06457

A Systematic Analysis of Hybrid Linear Attention

By Dustin Wang, Rui-Jie Zhu, et al.

Published 2025-07-08

Citation lineage

Review the prior work and downstream research connected to this paper.

Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deepl…

View the original paper on arXiv