| Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining | Sep 23, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation | Jul 1, 2020 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| Unsupervised Evaluation of Interactive Dialog with DialoGPT | Jun 23, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| Treating Dialogue Quality Evaluation as an Anomaly Detection Problem | May 1, 2020 | Anomaly DetectionDialogue Evaluation | —Unverified | 0 |
| Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers | May 1, 2020 | Dialogue Evaluation | —Unverified | 0 |
| Learning the Human Judgment for the Automatic Evaluation of Chatbot | May 1, 2020 | ChatbotDialogue Evaluation | —Unverified | 0 |
| Learning an Unreferenced Metric for Online Dialogue Evaluation | May 1, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation | May 1, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems | Apr 6, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning | Dec 10, 2019 | Continual LearningDialogue Evaluation | —Unverified | 0 |
| Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems | Nov 4, 2019 | Dialogue Evaluation | CodeCode Available | 0 |
| Towards Best Experiment Design for Evaluating Dialogue System Output | Sep 23, 2019 | Dialogue Evaluation | CodeCode Available | 0 |
| ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | Sep 6, 2019 | Dialogue Evaluation | —Unverified | 0 |
| Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References | Jul 24, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 |
| Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems | Jun 21, 2019 | Dialogue EvaluationKnowledge Distillation | CodeCode Available | 0 |
| Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings | Apr 24, 2019 | Dialogue Evaluationvalid | —Unverified | 0 |
| Evaluating Coherence in Dialogue Systems using Entailment | Apr 6, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 |
| Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses | Feb 23, 2019 | Dialogue EvaluationResponse Generation | —Unverified | 0 |
| One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning | May 8, 2018 | AllDialogue Evaluation | —Unverified | 0 |
| Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses | Aug 23, 2017 | Dialogue Evaluation | CodeCode Available | 0 |
| Adversarial Learning for Neural Dialogue Generation | Jan 23, 2017 | Dialogue EvaluationDialogue Generation | CodeCode Available | 0 |
| RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems | Jan 11, 2017 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |