arXiv 2603.13875

GradMem: Learning to Write Context into Memory with Test-Time Gradient Descent

By Yuri Kuratov, Matvey Kairov, et al.

Published 2026-03-14

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Many large language model applications require conditioning on long contexts. Transformers typically support this by storing a large per-layer KV-cache of past activations, which incurs substantial memory overhead. A desirable alternative is ompressive memory: read a context once, store it in a compact state, and answer many queries from that state. We study this in a context removal setting, where the model must ge…

View the original paper on arXiv