arXiv 2408.15664
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts
By Lean Wang, Huazuo Gao, et al.
Published 2024-08-28
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients durin…