SOTAVerified

Dialogue Evaluation

Papers

Showing 2130 of 97 papers

TitleStatusHype
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue GenerationCode1
DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationCode1
DialogBench: Evaluating LLMs as Human-like Dialogue SystemsCode1
Assessing Dialogue Systems with Distribution DistancesCode1
AdaCoach: A Virtual Coach for Training Customer Service Agents0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation0
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents0
Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations0
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
Show:102550
← PrevPage 3 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified