arXiv 1910.02054
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
By Samyam Rajbhandari, Jeff Rasley, et al.
Published 2019-10-04
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging. Existing solutions such as data and model parallelisms exhibit fundamental limitations to fit these models into limited device memory, while obtaining computation, communication and development efficiency. We develop a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, vast…