arXiv 2506.04434
Grokking and Generalization Collapse: Insights from \texttt{HTSR} theory
By Hari K. Prakash and Charles H. Martin
Published 2025-06-04
Mindmap
Browse the paper's core ideas, clusters, and relationships in a structured outline.
We study the well-known grokking phenomena in neural networks (NNs) using a 3-layer MLP trained on 1 k-sample subset of MNIST, with and without weight decay, and discover a novel third phase -- anti-grokking -- that occurs very late in training and resembles but is distinct from the familiar pre-grokking phases: test accuracy collapses while training accuracy stays perfect. This late-stage collapse is distinct, from…