arXiv 2410.13166
An Evolved Universal Transformer Memory
By Edoardo Cetin, Qi Sun, et al.
Published 2024-10-17
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs ato…