arXiv 2401.06104
Transformers are Multi-State RNNs
By Matanel Oren, Michael Hassid, et al.
Published 2024-01-11
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into multi-state RNNs by fixing the size of their…