SOTAVerified

Similarity Scoring for Dialogue Behaviour Comparison

2020-07-01SIGDIAL (ACL) 2020Unverified0· sign in to hype

Stefan Ultes, Wolfgang Maier

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

The differences in decision making between behavioural models of voice interfaces are hard to capture using existing measures for the absolute performance of such models. For instance, two models may have a similar task success rate, but very different ways of getting there. In this paper, we propose a general methodology to compute the similarity of two dialogue behaviour models and investigate different ways of computing scores on both the semantic and the textual level. Complementing absolute measures of performance, we test our scores on three different tasks and show the practical usability of the measures.

Tasks

Reproductions