SOTAVerified

Dialogue Evaluation

Papers

Showing 7697 of 97 papers

TitleStatusHype
DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels0
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows0
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation0
How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning0
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents0
Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings0
Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis0
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model0
Enhancing the Open-Domain Dialogue Evaluation in Latent Space0
CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge0
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation0
Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations0
DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation0
Learning the Human Judgment for the Automatic Evaluation of Chatbot0
LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation0
Leveraging LLMs for Dialogue Quality Measurement0
LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation0
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation0
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems0
AdaCoach: A Virtual Coach for Training Customer Service Agents0
WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track0
Treating Dialogue Quality Evaluation as an Anomaly Detection Problem0
Show:102550
← PrevPage 4 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified