arXiv 2505.23735

ATLAS: Learning to Optimally Memorize the Context at Test Time

By Ali Behrouz, Zeman Li, et al.

Published 2025-05-29

Citation lineage

Review the prior work and downstream research connected to this paper.

Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural networks (a.k.a long-t…

View the original paper on arXiv