arXiv 2410.13166

An Evolved Universal Transformer Memory

By Edoardo Cetin, Qi Sun, et al.

Published 2024-10-17

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs ato…

View the original paper on arXiv