SOTAVerified

Dialogue Evaluation

Papers

Showing 5197 of 97 papers

TitleStatusHype
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents0
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue EvaluationCode0
User Response and Sentiment Prediction for Automatic Dialogue Evaluation0
Automatic Evaluation and Moderation of Open-domain Dialogue SystemsCode1
GCDF1: A Goal- and Context- Driven F-Score for Evaluating User ModelsCode0
Proxy Indicators for the Quality of Open-domain DialoguesCode0
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation0
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space0
A Human-machine Collaborative Framework for Evaluating Malevolence in DialoguesCode0
Transformers for Headline Selection for Russian News ClustersCode0
Synthesizing Adversarial Negative Responses for Robust Response Ranking and EvaluationCode0
A Comprehensive Assessment of Dialog Evaluation MetricsCode1
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationCode0
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue UtterancesCode1
DynaEval: Unifying Turn and Dialogue Level EvaluationCode1
Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation ModelCode0
Towards Quantifiable Dialogue Coherence EvaluationCode1
Assessing Dialogue Systems with Distribution DistancesCode1
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels0
Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question AnsweringCode1
An Adversarially-Learned Turing Test for Dialog Generation ModelsCode0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue SystemsCode0
GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue SystemsCode1
Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale PretrainingCode1
Towards Holistic and Automatic Evaluation of Open-Domain Dialogue GenerationCode1
Unsupervised Evaluation of Interactive Dialog with DialoGPTCode1
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem0
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers0
Learning the Human Judgment for the Automatic Evaluation of Chatbot0
Learning an Unreferenced Metric for Online Dialogue EvaluationCode1
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog GenerationCode1
PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue SystemsCode1
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning0
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue SystemsCode0
Towards Best Experiment Design for Evaluating Dialogue System OutputCode0
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons0
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple ReferencesCode0
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog SystemsCode0
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings0
Evaluating Coherence in Dialogue Systems using EntailmentCode0
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses0
One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning0
Towards an Automatic Turing Test: Learning to Evaluate Dialogue ResponsesCode0
Adversarial Learning for Neural Dialogue GenerationCode0
RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog SystemsCode1
Show:102550
← PrevPage 2 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified