arXiv 2207.06881
Recurrent Memory Transformer
By Aydar Bulatov, Yuri Kuratov, et al.
Published 2022-07-14
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this…