SOTAVerified

Dialogue Evaluation

Papers

Showing 6170 of 97 papers

TitleStatusHype
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog EvaluationCode0
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges0
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic ManipulationsCode0
Achieving Reliable Human Assessment of Open-Domain Dialogue SystemsCode0
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows0
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents0
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue EvaluationCode0
User Response and Sentiment Prediction for Automatic Dialogue Evaluation0
GCDF1: A Goal- and Context- Driven F-Score for Evaluating User ModelsCode0
Proxy Indicators for the Quality of Open-domain DialoguesCode0
Show:102550
← PrevPage 7 of 10Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MDD-EvalSpearman Correlation0.51Unverified
2Lin-Reg (all)Spearman Correlation0.49Unverified
3USRSpearman Correlation0.42Unverified
4USR - DR (x = c)Spearman Correlation0.32Unverified
5USR - MLMSpearman Correlation0.31Unverified
6USR - DR (x = f)Spearman Correlation0.14Unverified
#ModelMetricClaimedVerifiedStatus
1Lin-Reg (all)Spearman Correlation0.54Unverified
2USR - DR (x = c)Spearman Correlation0.48Unverified
3USRSpearman Correlation0.47Unverified
4USR - MLMSpearman Correlation0.08Unverified
5USR - DR (x = f)Spearman Correlation-0.05Unverified