SOTAVerified

Structured Prediction via Learning to Search under Bandit Feedback

2017-09-01WS 2017Unverified0· sign in to hype

Amr Sharaf, Hal Daum{\'e} III

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

Tasks

Reproductions