SOTAVerified

Over-parameterised Shallow Neural Networks with Asymmetrical Node Scaling: Global Convergence Guarantees and Feature Learning

2023-02-02Code Available0· sign in to hype

Francois Caron, Fadhel Ayed, Paul Jung, Hoil Lee, Juho Lee, Hongseok Yang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We consider gradient-based optimisation of wide, shallow neural networks, where the output of each hidden node is scaled by a positive parameter. The scaling parameters are non-identical, differing from the classical Neural Tangent Kernel (NTK) parameterisation. We prove that for large such neural networks, with high probability, gradient flow and gradient descent converge to a global minimum and can learn features in some sense, unlike in the NTK parameterisation. We perform experiments illustrating our theoretical results and discuss the benefits of such scaling in terms of prunability and transfer learning.

Tasks

Reproductions