arXiv 2006.16668
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
By Dmitry Lepikhin, HyoukJoong Lee, et al.
Published 2020-06-30
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module…