Gradient-based Hyperparameter Optimization through Reversible Learning
2015-02-11Code Available0· sign in to hype
Dougal Maclaurin, David Duvenaud, Ryan P. Adams
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/HIPS/hypergradOfficialIn papernone★ 0
- github.com/Przemo23/Reversing_Gradient_Differentiation_NNtf★ 0
Abstract
Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.