| U-NEED: A Fine-grained Dataset for User Needs-Centric E-commerce Conversational Recommendation | May 5, 2023 | Conversational RecommendationDialogue Evaluation | —Unverified | 0 |
| Pragmatically Appropriate Diversity for Dialogue Evaluation | Apr 6, 2023 | Dialogue EvaluationDiversity | —Unverified | 0 |
| Improving Open-Domain Dialogue Evaluation with a Causal Inference Model | Jan 31, 2023 | Causal Inferencecounterfactual | —Unverified | 0 |
| PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment | Dec 18, 2022 | Data AugmentationDialogue Evaluation | —Unverified | 0 |
| Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations | Oct 1, 2022 | Dialogue EvaluationMulti-Task Learning | —Unverified | 0 |
| Dialogue Evaluation with Offline Reinforcement Learning | Sep 2, 2022 | Dialogue EvaluationOffline RL | —Unverified | 0 |
| SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation | Aug 17, 2022 | Contrastive LearningDialogue Evaluation | CodeCode Available | 0 |
| Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis | Jul 1, 2022 | Dialogue Evaluation | —Unverified | 0 |
| MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue | Jun 19, 2022 | Dialogue EvaluationMME | —Unverified | 0 |
| AdaCoach: A Virtual Coach for Training Customer Service Agents | Apr 27, 2022 | Dialogue Evaluation | —Unverified | 0 |
| What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation | Mar 25, 2022 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 0 |
| Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges | Mar 18, 2022 | Dialogue Evaluation | —Unverified | 0 |
| DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations | Mar 18, 2022 | Abstract Meaning RepresentationCoherence Evaluation | CodeCode Available | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Mar 11, 2022 | Dialogue Evaluation | CodeCode Available | 0 |
| FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows | Feb 14, 2022 | Dialogue Evaluation | —Unverified | 0 |
| Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents | Jan 12, 2022 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation | Dec 14, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| User Response and Sentiment Prediction for Automatic Dialogue Evaluation | Nov 16, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 |
| Proxy Indicators for the Quality of Open-domain Dialogues | Nov 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Investigating the Impact of Pre-trained Language Models on Dialog Evaluation | Oct 5, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Sep 17, 2021 | Dialogue Evaluation | —Unverified | 0 |
| A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues | Aug 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Enhancing the Open-Domain Dialogue Evaluation in Latent Space | Aug 1, 2021 | Dialogue Evaluation | —Unverified | 0 |
| Transformers for Headline Selection for Russian News Clusters | Jun 19, 2021 | Dialogue EvaluationSentence | CodeCode Available | 0 |
| Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation | Jun 10, 2021 | Binary ClassificationDialogue Evaluation | CodeCode Available | 0 |
| Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation | Jun 5, 2021 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 0 |
| Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model | Jun 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels | Apr 18, 2021 | Dialogue EvaluationMachine Translation | —Unverified | 0 |
| An Adversarially-Learned Turing Test for Dialog Generation Models | Apr 16, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track | Jan 20, 2021 | Dialogue EvaluationLanguage Modeling | —Unverified | 0 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 |
| Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers | May 1, 2020 | Dialogue Evaluation | —Unverified | 0 |
| Learning the Human Judgment for the Automatic Evaluation of Chatbot | May 1, 2020 | ChatbotDialogue Evaluation | —Unverified | 0 |
| Treating Dialogue Quality Evaluation as an Anomaly Detection Problem | May 1, 2020 | Anomaly DetectionDialogue Evaluation | —Unverified | 0 |
| How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning | Dec 10, 2019 | Continual LearningDialogue Evaluation | —Unverified | 0 |
| Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems | Nov 4, 2019 | Dialogue Evaluation | CodeCode Available | 0 |
| Towards Best Experiment Design for Evaluating Dialogue System Output | Sep 23, 2019 | Dialogue Evaluation | CodeCode Available | 0 |
| ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | Sep 6, 2019 | Dialogue Evaluation | —Unverified | 0 |
| Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References | Jul 24, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 |
| Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems | Jun 21, 2019 | Dialogue EvaluationKnowledge Distillation | CodeCode Available | 0 |
| Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings | Apr 24, 2019 | Dialogue Evaluationvalid | —Unverified | 0 |
| Evaluating Coherence in Dialogue Systems using Entailment | Apr 6, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 |
| Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses | Feb 23, 2019 | Dialogue EvaluationResponse Generation | —Unverified | 0 |
| One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning | May 8, 2018 | AllDialogue Evaluation | —Unverified | 0 |
| Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses | Aug 23, 2017 | Dialogue Evaluation | CodeCode Available | 0 |
| Adversarial Learning for Neural Dialogue Generation | Jan 23, 2017 | Dialogue EvaluationDialogue Generation | CodeCode Available | 0 |