arXiv 2505.23735

ATLAS: Learning to Optimally Memorize the Context at Test Time

By Ali Behrouz, Zeman Li, et al.

Published 2025-05-29

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Transformers have been established as the most popular backbones in sequence modeling, mainly due to their effectiveness in in-context retrieval tasks and the ability to learn at scale. Their quadratic memory and time complexity, however, bound their applicability in longer sequences and so has motivated researchers to explore effective alternative architectures such as modern recurrent neural networks (a.k.a long-t…

View the original paper on arXiv