End-to-end Task-oriented Dialog Policy Learning based on Pre-trained Language Model

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

This paper presents our approach to dialog policy learning (DPL), which aims to determine the next system’s action based on the current dialog state maintained by a dialog state tracking module. Different from previous stage-wise DPL, we propose an end-to-end DPL system to avoid error accumulation between the dialogue turns. The DPL system is deployed from two perspectives. Firstly, we consider turn-level DPL that selects the best dialog action from a predefined action set. Specifically, we proposed a dialog action-oriented BERT (DA-BERT), which integrates a new pre-training procedure named masked last action task (MLA) that encourages BERT to be dialog-aware and distill action-specific features. Secondly, we propose a word-level DPL that directly generates the dialog action. We creatively model DPL as a sequence generation model conditioned on the dialog action structure. Then GPT-2 equipped with an action structure parser module (termed as DA-GPT-2) is applied to learn the word level DPL. The effectiveness and different characteristics of the proposed models are demonstrated with the in-domain tasks and domain adaptation tasks on MultiWOZ with both simulator evaluation and human evaluation.

Tasks

dialog state tracking Domain Adaptation Language Modeling Language Modelling

End-to-end Task-oriented Dialog Policy Learning based on Pre-trained Language Model

Abstract

Tasks

Reproductions