arXiv 2410.07145
Stuffed Mamba: Oversized States Lead to the Inability to Forget
By Yingfa Chen, Xinrong Zhang, et al.
Published 2024-10-09
Citation lineage
Review the prior work and downstream research connected to this paper.
Recent advancements in recurrent architectures, such as Mamba and RWKV, have showcased strong language capabilities. Unlike transformer-based models, these architectures encode all contextual information into a fixed-size state, leading to great inference efficiency. However, this approach can cause information interference, where different token data conflicts, resulting in performance degradation and incoherent ou…