SOTAVerified

Dialogue Evaluation

Papers

Showing 8190 of 97 papers

TitleStatusHype
Learning the Human Judgment for the Automatic Evaluation of Chatbot0
Learning an Unreferenced Metric for Online Dialogue EvaluationCode1
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog GenerationCode1
PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue SystemsCode1
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning0
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue SystemsCode0
Towards Best Experiment Design for Evaluating Dialogue System OutputCode0
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons0
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple ReferencesCode0
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog SystemsCode0
Show:102550
← PrevPage 9 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified