arXiv 2401.06104

Transformers are Multi-State RNNs

By Matanel Oren, Michael Hassid, et al.

Published 2024-01-11

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into multi-state RNNs by fixing the size of their…

View the original paper on arXiv