arXiv 2603.06003

EvoESAP: Non-Uniform Expert Pruning for Sparse MoE

By Zongfang Liu, Shengkun Tang, et al.

Published 2026-03-06

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Sparse Mixture-of-Experts (SMoE) language models achieve strong capability at low per-token compute, yet deployment remains memory- and throughput-bound because the full expert pool must be stored and served. Post-training expert pruning reduces this cost, but most methods focus on which experts to prune within each layer and default to a uniform layer-wise sparsity allocation, even though the allocation can strongl…

View the original paper on arXiv