arXiv 2207.06881

Recurrent Memory Transformer

By Aydar Bulatov, Yuri Kuratov, et al.

Published 2022-07-14

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this…

View the original paper on arXiv