arXiv 2507.06457

A Systematic Analysis of Hybrid Linear Attention

By Dustin Wang, Rui-Jie Zhu, et al.

Published 2025-07-08

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deepl…

View the original paper on arXiv