SOTAVerified

Dialogue Evaluation

Papers

Showing 7180 of 97 papers

TitleStatusHype
Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringCode1
An Adversarially-Learned Turing Test for Dialog Generation ModelsCode0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue SystemsCode0
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue SystemsCode1
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale PretrainingCode1
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue GenerationCode1
Unsupervised Evaluation of Interactive Dialog with DialoGPTCode1
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem0
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers0
Show:102550
← PrevPage 8 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified