Surprises in High-Dimensional Ridgeless Least Squares Interpolation
Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Interpolators -- estimators that achieve zero training error -- have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum _2 norm (``ridgeless'') interpolation in high-dimensional least squares regression. We consider two different models for the feature distribution: a linear model, where the feature vectors x_i R^p are obtained by applying a linear transform to a vector of i.i.d.\ entries, x_i = ^1/2 z_i (with z_i R^p); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, x_i = (W z_i) (with z_i R^d, W R^p d a matrix of i.i.d.\ entries, and an activation function acting componentwise on W z_i). We recover -- in a precise quantitative way -- several phenomena that have been observed in large-scale neural networks and kernel machines, including the "double descent" behavior of the prediction risk, and the potential benefits of overparametrization.