arXiv 2504.05586
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations
By Ajay Jaiswal, Jianyu Wang, et al.
Published 2025-04-08
Discussion
Read the public discussion and references gathered around this paper.
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, vanilla SMoEs have issues such as expert redundancy and heavy memory requirements, making them inefficient and non-scalable, especially for resource-constrained scenarios. Expert-level sparsification of SMoEs involves pruning the least important experts to address these limitations. In this…