arXiv 2601.07372
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
By Xin Cheng, Wangding Zeng, et al.
Published 2026-01-12
Discussion
Read the public discussion and references gathered around this paper.
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic -gram embedding for O(1) lookup. By formulating the Sparsity Allocation pā¦