arXiv 2401.06104

Transformers are Multi-State RNNs

By Matanel Oren, Michael Hassid, et al.

Published 2024-01-11

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Transformers are considered conceptually different from the previous generation of state-of-the-art NLP models - recurrent neural networks (RNNs). In this work, we demonstrate that decoder-only transformers can in fact be conceptualized as unbounded multi-state RNNs - an RNN variant with unlimited hidden state size. We further show that transformers can be converted into multi-state RNNs by fixing the size of their…

View the original paper on arXiv