SOTAVerified

Dialogue Evaluation

Papers

Showing 4150 of 97 papers

TitleStatusHype
Improving Automated Evaluation of Open Domain Dialog via Diverse Reference AugmentationCode0
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple ReferencesCode0
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue EvaluationCode0
Measuring the Robustness of Reference-Free Dialogue Evaluation SystemsCode0
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue EvaluatorsCode0
Methods for Recognizing Nested TermsCode0
PairEval: Open-domain Dialogue Evaluation with Pairwise ComparisonCode0
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue SystemsCode0
Proxy Indicators for the Quality of Open-domain DialoguesCode0
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News TextsCode0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified