SOTAVerified

Dialogue Evaluation

Papers

Showing 3140 of 97 papers

TitleStatusHype
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response GenerationCode0
Measuring the Robustness of Reference-Free Dialogue Evaluation SystemsCode0
Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations0
Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMsCode0
ECoh: Turn-level Coherence Evaluation for Multilingual DialoguesCode0
On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation0
Leveraging LLMs for Dialogue Quality Measurement0
LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation0
SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues EvaluationCode0
PairEval: Open-domain Dialogue Evaluation with Pairwise ComparisonCode0
Show:102550
← PrevPage 4 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified