SOTAVerified

Dialogue Evaluation

Papers

Showing 2650 of 97 papers

TitleStatusHype
Towards Multilingual Automatic Dialogue EvaluationCode0
C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue EvaluationCode0
How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation0
DEnsity: Open-domain Dialogue Evaluation Metric using Density EstimationCode1
U-NEED: A Fine-grained Dataset for User Needs-Centric E-commerce Conversational Recommendation0
Pragmatically Appropriate Diversity for Dialogue Evaluation0
GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue GenerationCode1
Improving Open-Domain Dialogue Evaluation with a Causal Inference Model0
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue SystemsCode1
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment0
FineD-Eval: Fine-grained Automatic Dialogue-Level EvaluationCode1
Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations0
Dialogue Evaluation with Offline Reinforcement Learning0
SelF-Eval: Self-supervised Fine-grained Dialogue EvaluationCode0
Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis0
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue0
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in RussianCode1
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction TuningCode1
RuNNE-2022 Shared Task: Recognizing Nested Named EntitiesCode1
AdaCoach: A Virtual Coach for Training Customer Service Agents0
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog EvaluationCode0
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges0
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic ManipulationsCode0
Achieving Reliable Human Assessment of Open-Domain Dialogue SystemsCode0
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows0
Show:102550
← PrevPage 2 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified