Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

2017-10-31ICLR 2018Code Available0· sign in to hype

Will Grathwohl, Dami Choi, Yuhuai Wu, Geoffrey Roeder, David Duvenaud

Code Available — Be the first to reproduce this paper.

Code

github.com/duvenaud/relax
OfficialIn papertf★ 0
github.com/thlautenschlaeger/bpttv-lax
pytorch★ 1
github.com/brain-research/mirage-rl
tf★ 0
github.com/wgrathwohl/BackpropThroughTheVoidRL
tf★ 0
github.com/ElleryL/gradient_estimator
pytorch★ 0
github.com/Bonnevie/rebar
tf★ 0
github.com/TalkToTheGAN/REGAN
pytorch★ 0

Abstract

Gradient-based optimization is the foundation of deep learning and reinforcement learning. Even when the mechanism being optimized is unknown or not differentiable, optimization using high-variance or biased gradient estimates is still often the best strategy. We introduce a general framework for learning low-variance, unbiased gradient estimators for black-box functions of random variables. Our method uses gradients of a neural network trained jointly with model parameters or policies, and is applicable in both discrete and continuous settings. We demonstrate this framework for training discrete latent-variable models. We also give an unbiased, action-conditional extension of the advantage actor-critic reinforcement learning algorithm.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)

Backpropagation through the Void: Optimizing control variates for black-box gradient estimation

Code

Abstract

Tasks

Reproductions