Bayesian Inference with Deep Weakly Nonlinear Networks
Boris Hanin, Alexander Zlokapa
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
We show at a physics level of rigor that Bayesian inference with a fully connected neural network and a shaped nonlinearity of the form (t) = t + t^3/L is (perturbatively) solvable in the regime where the number of training datapoints P , the input dimension N_0, the network layer widths N, and the network depth L are simultaneously large. Our results hold with weak assumptions on the data; the main constraint is that P < N_0. We provide techniques to compute the model evidence and posterior to arbitrary order in 1/N and at arbitrary temperature. We report the following results from the first-order computation: 1. When the width N is much larger than the depth L and training set size P, neural network Bayesian inference coincides with Bayesian inference using a kernel. The value of determines the curvature of a sphere, hyperbola, or plane into which the training data is implicitly embedded under the feature map. 2. When LP/N is a small constant, neural network Bayesian inference departs from the kernel regime. At zero temperature, neural network Bayesian inference is equivalent to Bayesian inference using a data-dependent kernel, and LP/N serves as an effective depth that controls the extent of feature learning. 3. In the restricted case of deep linear networks (=0) and noisy data, we show a simple data model for which evidence and generalization error are optimal at zero temperature. As LP/N increases, both evidence and generalization further improve, demonstrating the benefit of depth in benign overfitting.