arXiv 2312.00752

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

By Albert Gu and Tri Dao

Published 2023-12-01

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers' computational inefficiency on long sequences, but they…

View the original paper on arXiv