arXiv 2201.02177

Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets

By Alethea Power, Yuri Burda, et al.

Published 2022-01-06

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

In this paper we propose to study generalization of neural networks on small algorithmically generated datasets. In this setting, questions about data efficiency, memorization, generalization, and speed of learning can be studied in great detail. In some situations we show that neural networks learn through a process of "grokking" a pattern in the data, improving generalization performance from random chance level t…

View the original paper on arXiv