arXiv 1701.06538
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
By Noam Shazeer, Azalia Mirhoseini, et al.
Published 2017-01-23
Citation lineage
Review the prior work and downstream research connected to this paper.
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address t…