arXiv 2504.05586

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations

By Ajay Jaiswal, Jianyu Wang, et al.

Published 2025-04-08

Discussion

Read the public discussion and references gathered around this paper.

Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, vanilla SMoEs have issues such as expert redundancy and heavy memory requirements, making them inefficient and non-scalable, especially for resource-constrained scenarios. Expert-level sparsification of SMoEs involves pruning the least important experts to address these limitations. In this…

View the original paper on arXiv