arXiv 2501.00663

Titans: Learning to Memorize at Test Time

By Ali Behrouz, Peilin Zhong, et al.

Published 2024-12-31

Citation lineage

Review the prior work and downstream research connected to this paper.

Over more than a decade there has been an extensive research effort on how to effectively utilize recurrent models and attention. While recurrent models aim to compress the data into a fixed-size memory (called hidden state), attention allows attending to the entire context window, capturing the direct dependencies of all tokens. This more accurate modeling of dependencies, however, comes with a quadratic cost, limi…

View the original paper on arXiv