arXiv 2006.16668

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding

By Dmitry Lepikhin, HyoukJoong Lee, et al.

Published 2020-06-30

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Neural network scaling has been critical for improving the model quality in many real-world machine learning applications with vast amounts of training data and compute. Although this trend of scaling is affirmed to be a sure-fire approach for better model quality, there are challenges on the path such as the computation cost, ease of programming, and efficient implementation on parallel devices. GShard is a module…

View the original paper on arXiv