arXiv 2408.15664

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

By Lean Wang, Huazuo Gao, et al.

Published 2024-08-28

Citation lineage

Review the prior work and downstream research connected to this paper.

For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients durin…

View the original paper on arXiv