arXiv 2308.07661

Attention Is Not All You Need Anymore

By Zhe Chen

Published 2023-08-15

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

In recent years, the popular Transformer architecture has achieved great success in many application areas, including natural language processing and computer vision. Many existing works aim to reduce the computational and memory complexity of the self-attention mechanism in the Transformer by trading off performance. However, performance is key for the continuing success of the Transformer. In this paper, a family…

View the original paper on arXiv