arXiv 2601.07372

Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models

By Xin Cheng, Wangding Zeng, et al.

Published 2026-01-12

Discussion

Read the public discussion and references gathered around this paper.

While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval through computation. To address this, we introduce conditional memory as a complementary sparsity axis, instantiated via Engram, a module that modernizes classic -gram embedding for O(1) lookup. By formulating the Sparsity Allocation p…

View the original paper on arXiv