arXiv 2507.06457

A Systematic Analysis of Hybrid Linear Attention

By Dustin Wang, Rui-Jie Zhu, et al.

Published 2025-07-08

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Transformers face quadratic complexity and memory issues with long sequences, prompting the adoption of linear attention mechanisms using fixed-size hidden states. However, linear models often suffer from limited recall performance, leading to hybrid architectures that combine linear and full attention layers. Despite extensive hybrid architecture research, the choice of linear attention component has not been deepl…

View the original paper on arXiv