Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

2023-10-26Code Available0· sign in to hype

Jack Miller, Charles O'Neill, Thang Bui

Code Available — Be the first to reproduce this paper.

Code

github.com/jackmiller2003/tiny-gen
OfficialIn paperpytorch★ 4

Abstract

In some settings neural networks exhibit a phenomenon known as grokking, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we discover that grokking is not limited to neural networks but occurs in other settings such as Gaussian process (GP) classification, GP regression, linear regression and Bayesian neural networks. We also uncover a mechanism by which to induce grokking on algorithmic datasets via the addition of dimensions containing spurious information. The presence of the phenomenon in non-neural architectures shows that grokking is not restricted to settings considered in current theoretical and empirical studies. Instead, grokking may be possible in any model where solution search is guided by complexity and error.

Tasks

regression

Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity

Code

Abstract

Tasks

Reproductions