arXiv 1910.02054

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

By Samyam Rajbhandari, Jeff Rasley, et al.

Published 2019-10-04

Citation lineage

Review the prior work and downstream research connected to this paper.

Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging. Existing solutions such as data and model parallelisms exhibit fundamental limitations to fit these models into limited device memory, while obtaining computation, communication and development efficiency. We develop a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, vast…

View the original paper on arXiv