arXiv 1909.08053

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

By Mohammad Shoeybi, Mostofa Patwary, et al.

Published 2019-09-17

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables trainin…

View the original paper on arXiv