Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

2020-07-09Unverified0· sign in to hype

Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

Unverified — Be the first to reproduce this paper.

Abstract

We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input xR^d is drawn from a Gaussian distribution and the label of x satisfies f^(x) = a^|W^x|, where aR^d is a nonnegative vector and W^ R^d d is an orthonormal matrix. We show that an over-parametrized two-layer neural network with ReLU activation, trained by gradient descent from random initialization, can provably learn the ground truth network with population loss at most o(1/d) in polynomial time with polynomial samples. On the other hand, we prove that any kernel method, including Neural Tangent Kernel, with a polynomial number of samples in d, has population loss at least (1 / d).

Tasks

Vocal Bursts Valence Prediction

Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

Abstract

Tasks

Reproductions