An Actor-Critic Algorithm for Sequence Prediction

2016-07-24Code Available0· sign in to hype

Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

Code Available — Be the first to reproduce this paper.

Code

github.com/rizar/actor-critic-public
OfficialIn papernone★ 0
github.com/joeynmt/joeynmt
pytorch★ 713
github.com/juliakreutzer/joeynmt
pytorch★ 0

Abstract

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a critic network that is trained to predict the value of an output token, given the policy of an actor network. This results in a training procedure that is much closer to the test phase, and allows us to directly optimize for a task-specific score such as BLEU. Crucially, since we leverage these techniques in the supervised learning setting rather than the traditional RL setting, we condition the critic network on the ground-truth output. We show that our method leads to improved performance on both a synthetic task, and for German-English machine translation. Our analysis paves the way for such methods to be applied in natural language generation tasks, such as machine translation, caption generation, and dialogue modelling.

Tasks

Caption Generation Machine Translation Prediction Reinforcement Learning Reinforcement Learning (RL)Spelling Correction Text Generation Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
IWSLT2014 German-English	Actor-Critic [Bahdanau2017]	BLEU score	28.53	—	Unverified
IWSLT2015 English-German	RNNsearch	BLEU score	25.04	—	Unverified
IWSLT2015 German-English	RNNsearch	BLEU score	29.98	—	Unverified

An Actor-Critic Algorithm for Sequence Prediction

Code

Abstract

Tasks

Benchmark Results

Reproductions