arXiv 2410.13166
An Evolved Universal Transformer Memory
By Edoardo Cetin, Qi Sun, et al.
Published 2024-10-17
Discussion
Read the public discussion and references gathered around this paper.
Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs ato…