arXiv 2312.00752
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
By Albert Gu and Tri Dao
Published 2023-12-01
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they…