arXiv 2410.07145

Stuffed Mamba: Oversized States Lead to the Inability to Forget

By Yingfa Chen, Xinrong Zhang, et al.

Published 2024-10-09

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Recent advancements in recurrent architectures, such as Mamba and RWKV, have showcased strong language capabilities. Unlike transformer-based models, these architectures encode all contextual information into a fixed-size state, leading to great inference efficiency. However, this approach can cause information interference, where different token data conflicts, resulting in performance degradation and incoherent ou…

View the original paper on arXiv