Analysis of the expected L_2 error of an over-parametrized deep neural network estimate learned by gradient descent without regularization
Selina Drews, Michael Kohler
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Recent results show that estimates defined by over-parametrized deep neural networks learned by applying gradient descent to a regularized empirical L_2 risk are universally consistent and achieve good rates of convergence. In this paper, we show that the regularization term is not necessary to obtain similar results. In the case of a suitably chosen initialization of the network, a suitable number of gradient descent steps, and a suitable step size we show that an estimate without a regularization term is universally consistent for bounded predictor variables. Additionally, we show that if the regression function is H\"older smooth with H\"older exponent 1/2 p 1, the L_2 error converges to zero with a convergence rate of approximately n^-1/(1+d). Furthermore, in case of an interaction model, where the regression function consists of a sum of H\"older smooth functions with d^* components, a rate of convergence is derived which does not depend on the input dimension d.