arXiv 2504.05586

Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations

By Ajay Jaiswal, Jianyu Wang, et al.

Published 2025-04-08

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Sparsely activated Mixture-of-Experts (SMoE) has shown promise in scaling up the learning capacity of neural networks. However, vanilla SMoEs have issues such as expert redundancy and heavy memory requirements, making them inefficient and non-scalable, especially for resource-constrained scenarios. Expert-level sparsification of SMoEs involves pruning the least important experts to address these limitations. In this…

View the original paper on arXiv