arXiv 2509.14786
Pre-training under infinite compute
By Konwoo Kim, Suhas Kotha, et al.
Published 2025-09-18
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay…