arXiv 2410.07145
Stuffed Mamba: Oversized States Lead to the Inability to Forget
By Yingfa Chen, Xinrong Zhang, et al.
Published 2024-10-09
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Recent advancements in recurrent architectures, such as Mamba and RWKV, have showcased strong language capabilities. Unlike transformer-based models, these architectures encode all contextual information into a fixed-size state, leading to great inference efficiency. However, this approach can cause information interference, where different token data conflicts, resulting in performance degradation and incoherent ou…