arXiv 1910.02054

ZeRO: Memory Optimizations Toward Training Trillion Parameter Models

By Samyam Rajbhandari, Jeff Rasley, et al.

Published 2019-10-04

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Large deep learning models offer significant accuracy gains, but training billions to trillions of parameters is challenging. Existing solutions such as data and model parallelisms exhibit fundamental limitations to fit these models into limited device memory, while obtaining computation, communication and development efficiency. We develop a novel solution, Zero Redundancy Optimizer (ZeRO), to optimize memory, vast…

View the original paper on arXiv