SOTAVerified

Dialogue Evaluation

Papers

Showing 7697 of 97 papers

TitleStatusHype
Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationCode0
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationCode0
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation ModelCode0
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels0
An Adversarially-Learned Turing Test for Dialog Generation ModelsCode0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue SystemsCode0
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers0
Learning the Human Judgment for the Automatic Evaluation of Chatbot0
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem0
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning0
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue SystemsCode0
Towards Best Experiment Design for Evaluating Dialogue System OutputCode0
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons0
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple ReferencesCode0
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog SystemsCode0
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings0
Evaluating Coherence in Dialogue Systems using EntailmentCode0
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses0
One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning0
Towards an Automatic Turing Test: Learning to Evaluate Dialogue ResponsesCode0
Adversarial Learning for Neural Dialogue GenerationCode0
Show:102550
← PrevPage 4 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified