SOTAVerified

Dialogue Evaluation

Papers

Showing 4150 of 97 papers

TitleStatusHype
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment0
Pragmatically Appropriate Diversity for Dialogue Evaluation0
Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers0
Dialogue Evaluation with Offline Reinforcement Learning0
RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue0
Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses0
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges0
Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space0
Show:102550
← PrevPage 5 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified