arXiv 2509.14786
Pre-training under infinite compute
By Konwoo Kim, Suhas Kotha, et al.
Published 2025-09-18
Discussion
Read the public discussion and references gathered around this paper.
Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay…