arXiv 2602.11761

MiniCPM-SALA: Hybridizing Sparse and Linear Attention for Efficient Long-Context Modeling

By MiniCPM Team, Wenhao An, et al.

Published 2026-02-12

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The evolution of large language models (LLMs) towards applications with ultra-long contexts faces challenges posed by the high computational and memory costs of the Transformer architecture. While existing sparse and linear attention mechanisms attempt to mitigate these issues, they typically involve a trade-off between memory efficiency and model performance. This paper introduces MiniCPM-SALA, a 9B-parameter hybri…

View the original paper on arXiv