SOTAVerified

Dialogue Evaluation

Papers

Showing 5175 of 97 papers

TitleStatusHype
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents0
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue EvaluationCode0
User Response and Sentiment Prediction for Automatic Dialogue Evaluation0
Automatic Evaluation and Moderation of Open-domain Dialogue SystemsCode1
GCDF1: A Goal- and Context- Driven F-Score for Evaluating User ModelsCode0
Proxy Indicators for the Quality of Open-domain DialoguesCode0
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation0
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space0
A Human-machine Collaborative Framework for Evaluating Malevolence in DialoguesCode0
Transformers for Headline Selection for Russian News ClustersCode0
Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationCode0
A Comprehensive Assessment of Dialog Evaluation MetricsCode1
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationCode0
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue UtterancesCode1
DynaEval: Unifying Turn and Dialogue Level EvaluationCode1
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation ModelCode0
Towards Quantifiable Dialogue Coherence EvaluationCode1
Assessing Dialogue Systems with Distribution DistancesCode1
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels0
Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringCode1
An Adversarially-Learned Turing Test for Dialog Generation ModelsCode0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue SystemsCode0
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue SystemsCode1
Show:102550
← PrevPage 3 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified