Buy 4 REINFORCE Samples, Get a Baseline for Free!

2019-03-16ICLR Workshop drlStructPred 2019Unverified0· sign in to hype

Wouter Kool, Herke van Hoof, Max Welling

Unverified — Be the first to reproduce this paper.

Abstract

REINFORCE can be used to train models in structured prediction settings to directly optimize the test-time objective. However, the common case of sampling one prediction per datapoint (input) is data-inefficient. We show that by drawing multiple samples (predictions) per datapoint, we can learn with significantly less data, as we freely obtain a REINFORCE baseline to reduce variance. Additionally we derive a REINFORCE estimator with baseline, based on sampling without replacement. Combined with a recent technique to sample sequences without replacement using Stochastic Beam Search, this improves the training procedure for a sequence model that predicts the solution to the Travelling Salesman Problem.

Tasks

Prediction Structured Prediction

Buy 4 REINFORCE Samples, Get a Baseline for Free!

Abstract

Tasks

Reproductions