arXiv 2408.15664

Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts

By Lean Wang, Huazuo Gao, et al.

Published 2024-08-28

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

For Mixture-of-Experts (MoE) models, an unbalanced expert load will lead to routing collapse or increased computational overhead. Existing methods commonly employ an auxiliary loss to encourage load balance, but a large auxiliary loss will introduce non-negligible interference gradients into training and thus impair the model performance. In order to control load balance while not producing undesired gradients durin…

View the original paper on arXiv