arXiv 2304.11277

PyTorch FSDP: Experiences on Scaling Fully Sharded Data Parallel

By Yanli Zhao, Andrew Gu, et al.

Published 2023-04-21

Citation lineage

Review the prior work and downstream research connected to this paper.

It is widely acknowledged that large models have the potential to deliver superior performance across a broad range of domains. Despite the remarkable progress made in the field of machine learning systems research, which has enabled the development and exploration of large models, such abilities remain confined to a small group of advanced users and industry leaders, resulting in an implicit technical barrier for t…

View the original paper on arXiv