arXiv 2408.15664
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
By Lean Wang, Huazuo Gao, et al.
Published 2024-08-28
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients durin…