Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning
Ruqi Zhang, Chunyuan Li, Jianyi Zhang, Changyou Chen, Andrew Gordon Wilson
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/ruqizhang/csgmcmcOfficialIn paperpytorch★ 0
- github.com/WayneDW/Contour-Stochastic-Gradient-Langevin-Dynamicsnone★ 42
- github.com/cobypenso/functional_ensemble_distillationpytorch★ 6
Abstract
The posteriors over neural network weights are high dimensional and multimodal. Each mode typically characterizes a meaningfully different representation of the data. We develop Cyclical Stochastic Gradient MCMC (SG-MCMC) to automatically explore such distributions. In particular, we propose a cyclical stepsize schedule, where larger steps discover new modes, and smaller steps characterize each mode. We also prove non-asymptotic convergence of our proposed algorithm. Moreover, we provide extensive experimental results, including ImageNet, to demonstrate the scalability and effectiveness of cyclical SG-MCMC in learning complex multimodal distributions, especially for fully Bayesian inference with modern deep neural networks.