Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks
2020-03-28Unverified0· sign in to hype
Amit Daniely
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We prove that a single step of gradient decent over depth two network, with q hidden neurons, starting from orthogonal initialization, can memorize (dq^4(d)) independent and randomly labeled Gaussians in R^d. The result is valid for a large class of activation functions, which includes the absolute value.