arXiv 2501.00663

Titans: Learning to Memorize at Test Time

By Ali Behrouz, Peilin Zhong, et al.

Published 2024-12-31

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limi…

View the original paper on arXiv