SOTAVerified

Dialogue Evaluation

Papers

Showing 7180 of 97 papers

TitleStatusHype
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation0
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems0
A Human-machine Collaborative Framework for Evaluating Malevolence in DialoguesCode0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space0
Transformers for Headline Selection for Russian News ClustersCode0
Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationCode0
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationCode0
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation ModelCode0
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels0
An Adversarially-Learned Turing Test for Dialog Generation ModelsCode0
Show:102550
← PrevPage 8 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified