SOTAVerified

Dialogue Evaluation

Papers

Showing 110 of 97 papers

TitleStatusHype
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation0
MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue EvaluatorsCode0
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation0
LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation0
Methods for Recognizing Nested TermsCode0
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News TextsCode0
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language ModelsCode2
BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response GenerationCode0
Measuring the Robustness of Reference-Free Dialogue Evaluation SystemsCode0
Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations0
Show:102550
← PrevPage 1 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified