arXiv 2509.14786

Pre-training under infinite compute

By Konwoo Kim, Suhas Kotha, et al.

Published 2025-09-18

Discussion

Read the public discussion and references gathered around this paper.

Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay…

View the original paper on arXiv