Assessing Dialogue Systems with Distribution Distances
Jiannan Xiang, Yahui Liu, Deng Cai, Huayang Li, Defu Lian, Lemao Liu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/yhlleo/frechet-bert-distanceOfficialIn paperpytorch★ 24
Abstract
An important aspect of developing dialogue systems is how to evaluate and compare the performance of different systems. Existing automatic evaluation metrics are based on turn-level quality evaluation and use average scores for system-level comparison. In this paper, we propose to measure the performance of a dialogue system by computing the distribution-wise distance between its generated conversations and real-world conversations. Specifically, two distribution-wise metrics, FBD and PRD, are developed and evaluated. Experiments on several dialogue corpora show that our proposed metrics correlate better with human judgments than existing metrics.