SOTAVerified

Dialogue Evaluation

Papers

Showing 110 of 97 papers

TitleStatusHype
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
DynaEval: Unifying Turn and Dialogue Level EvaluationCode1
DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationCode1
Automatic Evaluation and Moderation of Open-domain Dialogue SystemsCode1
DialogBench: Evaluating LLMs as Human-like Dialogue SystemsCode1
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue SystemsCode1
Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue UtterancesCode1
Assessing Dialogue Systems with Distribution DistancesCode1
A Comprehensive Assessment of Dialog Evaluation MetricsCode1
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in RussianCode1
Show:102550
← PrevPage 1 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified