arXiv 2601.02360

Heterogeneous Low-Bandwidth Pre-Training of LLMs

By Yazan Obeidi, Amir Sarfi, et al.

Published 2026-01-05

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when model parallelism forces frequent, large inter-device communications. We study whether SparseLoCo, a low-communication data parallel method based on infrequent synchronization and sparse pseudo-gradient exchange, can be combined…

View the original paper on arXiv