arXiv 2509.14786
Pre-training under infinite compute
By Konwoo Kim, Suhas Kotha, et al.
Published 2025-09-18
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay…