arXiv 2410.13166

An Evolved Universal Transformer Memory

By Edoardo Cetin, Qi Sun, et al.

Published 2024-10-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs ato…

View the original paper on arXiv