arXiv 2509.14786

Pre-training under infinite compute

By Konwoo Kim, Suhas Kotha, et al.

Published 2025-09-18

Mindmap

Browse the paper's core ideas, clusters, and relationships in a structured outline.

Since compute grows much faster than web text available for language model pre-training, we ask how one should approach pre-training under fixed data and no compute constraints. We first show that existing data-constrained approaches of increasing epoch count and parameter count eventually overfit, and we significantly improve upon such recipes by properly tuning regularization, finding that the optimal weight decay…

View the original paper on arXiv