Consecutive Task-oriented Dialog Policy Learning
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
A practical dialog policy agent must be flexible and can handle new scenarios easily. To achieve this, the agent should be able to expand its knowledge base. This must be done, however, without affecting the agent's performance. Nevertheless, existing dialog systems fail to do so. In practice, new knowledge expansion and old experience preservation concurrently are conflicting. This is a challenging task which occurs regularly in continual learning. We present a novel model which can conduct consecutive dialog policy learning for a series of tasks without catastrophic forgetting. We tackle the issue from three different aspects: (1) For effective old task preservation, we employ a continual Q-learning module which is based on replayed experience to retain the policy trained on historic tasks. (2) For efficient new task acquisition, we integrate an invariant risk minimization module to learn a stable policy predictor to avoid spurious corrections in the training data. (3) For saving for storing the replayed experiences, we introduce a linear-decay replay buffer management. The effectiveness of the proposed model is evaluated theoretically and experimentally by both simulation and human.