arXiv 2207.06881

Recurrent Memory Transformer

By Aydar Bulatov, Yuri Kuratov, et al.

Published 2022-07-14

Citation lineage

Review the prior work and downstream research connected to this paper.

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this…

View the original paper on arXiv