arXiv 1701.06538

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

By Noam Shazeer, Azalia Mirhoseini, et al.

Published 2017-01-23

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address t…

View the original paper on arXiv