SOTAVerified

Consecutive Task-oriented Dialog Policy Learning

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

A practical dialog policy agent must be flexible and can handle new scenarios easily. To achieve this, the agent should be able to expand its knowledge base. This must be done, however, without affecting the agent's performance. Nevertheless, existing dialog systems fail to do so. In practice, new knowledge expansion and old experience preservation concurrently are conflicting. This is a challenging task which occurs regularly in continual learning. We present a novel model which can conduct consecutive dialog policy learning for a series of tasks without catastrophic forgetting. We tackle the issue from three different aspects: (1) For effective old task preservation, we employ a continual Q-learning module which is based on replayed experience to retain the policy trained on historic tasks. (2) For efficient new task acquisition, we integrate an invariant risk minimization module to learn a stable policy predictor to avoid spurious corrections in the training data. (3) For saving for storing the replayed experiences, we introduce a linear-decay replay buffer management. The effectiveness of the proposed model is evaluated theoretically and experimentally by both simulation and human.

Tasks

Reproductions