arXiv 1909.08053

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

By Mohammad Shoeybi, Mostofa Patwary, et al.

Published 2019-09-17

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables trainin…

View the original paper on arXiv