arXiv 2504.05586
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations
By Ajay Jaiswal, Jianyu Wang, et al.
Published 2025-04-08
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, vanilla SMoEs have issues such as expert redundancy and heavy memory requirements, making them inefficient and non-scalable, especially for resource-constrained scenarios. Expert-level sparsification of SMoEs involves pruning the least important experts to address these limitations. In this…