Learning to Learn without Gradient Descent by Gradient Descent

2016-11-11ICML 2017Unverified0· sign in to hype

Yutian Chen, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Timothy P. Lillicrap, Matt Botvinick, Nando de Freitas

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

Tasks

Bayesian Optimization global-optimization

Learning to Learn without Gradient Descent by Gradient Descent

Abstract

Tasks

Reproductions