arXiv 2001.08361

Scaling Laws for Neural Language Models

By Jared Kaplan, Sam McCandlish, et al.

Published 2020-01-23

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We study empirical scaling laws for language model performance on the cross-entropy loss. The loss scales as a power-law with model size, dataset size, and the amount of compute used for training, with some trends spanning more than seven orders of magnitude. Other architectural details such as network width or depth have minimal effects within a wide range. Simple equations govern the dependence of overfitting on m…

View the original paper on arXiv