arXiv 2408.15664
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
By Lean Wang, Huazuo Gao, et al.
Published 2024-08-28
Citation lineage
Review the prior work and downstream research connected to this paper.
For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients durin…