Analysis of learning a flow-based generative model from limited sample complexity
Hugo Cui, Florent Krzakala, Eric Vanden-Eijnden, Lenka Zdeborová
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/spoc-group/diffusion_gmmOfficialIn paperpytorch★ 2
Abstract
We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number n of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as _n(1n). Finally, this rate is shown to be in fact Bayes-optimal.