arXiv 2601.07372
Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models
By Xin Cheng, Wangding Zeng, et al.
Published 2026-01-12
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic -gram embedding for O(1) lookup. By formulating the Sparsity Allocation pā¦