arXiv 2501.13484
MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods
By Zukang Xu, Yuxuan Yue, et al.
Published 2025-01-23
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Mamba is an efficient sequence model that rivals Transformers and demonstrates significant potential as a foundational architecture for various tasks. Quantization is commonly used in neural networks to reduce model size and computational latency. However, applying quantization to Mamba remains underexplored, and existing quantization methods, which have been effective for CNN and Transformer models, appear inadequa…