arXiv 2207.06881
Recurrent Memory Transformer
By Aydar Bulatov, Yuri Kuratov, et al.
Published 2022-07-14
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this…