Post-processing Networks: A Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Many studies have proposed methods for optimizing the dialogue performance of an entire pipeline system by jointly training modules in the system using reinforcement learning. However, these methods are limited in that they can only be applied to modules implemented using trainable neural-based methods. To address this problem, we propose a method for optimizing a pipeline system composed of modules implemented with arbitrary methods for dialogue performance. In our method, neural-based components called post-processing networks (PPNs) are installed inside the system to post-process the output of each module. All PPNs are updated to improve the overall dialogue performance of the system by using reinforcement learning, not necessitating each module to be updated. Through dialogue simulation experiments on the MultiWOZ dataset, we show that PPNs can improve the dialogue performance of pipeline systems consisting of various modules.

Tasks

reinforcement-learning Reinforcement Learning Reinforcement Learning (RL)Task-Oriented Dialogue Systems

Post-processing Networks: A Method for Optimizing Pipeline Task-oriented Dialogue Systems using Reinforcement Learning

Abstract

Tasks

Reproductions