arXiv 2505.23735
ATLAS: Learning to Optimally Memorize the Context at Test Time
By Ali Behrouz, Zeman Li, et al.
Published 2025-05-29
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural networks (a.k.a long-t…