arXiv 2506.04434

Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory

By Hari K. Prakash and Charles H. Martin

Published 2025-06-04

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

We study the well-known grokking phenomena in neural networks (NNs) using a 3-layer MLP trained on 1 k-sample subset of MNIST, with and without weight decay, and discover a novel third phase -- anti-grokking -- that occurs very late in training and resembles but is distinct from the familiar pre-grokking phases: test accuracy collapses while training accuracy stays perfect. This late-stage collapse is distinct, from…

View the original paper on arXiv