Structured Prediction via Learning to Search under Bandit Feedback

2017-09-01WS 2017Unverified0· sign in to hype

Amr Sharaf, Hal Daum{\'e} III

Unverified — Be the first to reproduce this paper.

Abstract

We present an algorithm for structured prediction under online bandit feedback. The learner repeatedly predicts a sequence of actions, generating a structured output. It then observes feedback for that output and no others. We consider two cases: a pure bandit setting in which it only observes a loss, and more fine-grained feedback in which it observes a loss for every action. We find that the fine-grained feedback is necessary for strong empirical performance, because it allows for a robust variance-reduction strategy. We empirically compare a number of different algorithms and exploration methods and show the efficacy of BLS on sequence labeling and dependency parsing tasks.

Tasks

Active Learning Dependency Parsing Prediction Structured Prediction

Structured Prediction via Learning to Search under Bandit Feedback

Abstract

Tasks

Reproductions