Sparse Optimization on Measures with Over-parameterized Gradient Descent
Lenaic Chizat
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/lchizat/2019-sparse-optim-measuresOfficialIn papernone★ 0
Abstract
Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical problem arising, e.g., in sparse spikes deconvolution or two-layer neural networks training. We show that this problem can be solved by discretizing the measure and running non-convex gradient descent on the positions and weights of the particles. For measures on a d-dimensional manifold and under some non-degeneracy assumptions, this leads to a global optimization algorithm with a complexity scaling as (1/) in the desired accuracy , instead of ^-d for convex methods. The key theoretical tools are a local convergence analysis in Wasserstein space and an analysis of a perturbed mirror descent in the space of measures. Our bounds involve quantities that are exponential in d which is unavoidable under our assumptions.