Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

2021-11-01EMNLP 2021Unverified0· sign in to hype

Tobias Falke, Patrick Lehnen

Unverified — Be the first to reproduce this paper.

Abstract

With counterfactual bandit learning, models can be trained based on positive and negative feedback received for historical predictions, with no labeled data needed. Such feedback is often available in real-world dialog systems, however, the modularized architecture commonly used in large-scale systems prevents the direct application of such algorithms. In this paper, we study the feedback attribution problem that arises when using counterfactual bandit learning for multi-domain spoken language understanding. We introduce an experimental setup to simulate the problem on small-scale public datasets, propose attribution methods inspired by multi-agent reinforcement learning and evaluate them against multiple baselines. We find that while directly using overall feedback leads to disastrous performance, our proposed attribution methods can allow training competitive models from user feedback.

Tasks

counterfactual Multi-agent Reinforcement Learning Reinforcement Learning (RL)Spoken Language Understanding

Feedback Attribution for Counterfactual Bandit Learning in Multi-Domain Spoken Language Understanding

Abstract

Tasks

Reproductions