arXiv 2506.04434
Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
By Hari K. Prakash and Charles H. Martin
Published 2025-06-04
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
We study the well-known grokking phenomena in neural networks (NNs) using a 3-layer MLP trained on 1 k-sample subset of MNIST, with and without weight decay, and discover a novel third phase -- anti-grokking -- that occurs very late in training and resembles but is distinct from the familiar pre-grokking phases: test accuracy collapses while training accuracy stays perfect. This late-stage collapse is distinct, from…