arXiv 1909.08053
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
By Mohammad Shoeybi, Mostofa Patwary, et al.
Published 2019-09-17
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Recent work in language modeling demonstrates that training large transformer models advances the state of the art in Natural Language Processing applications. However, very large models can be quite difficult to train due to memory constraints. In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables trainin…