| Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering | Apr 16, 2021 | Abstractive Text SummarizationDialogue Evaluation | CodeCode Available | 1 |
| An Adversarially-Learned Turing Test for Dialog Generation Models | Apr 16, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track | Jan 20, 2021 | Dialogue EvaluationLanguage Modeling | —Unverified | 0 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 |
| GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems | Oct 8, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining | Sep 23, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation | Jul 1, 2020 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| Unsupervised Evaluation of Interactive Dialog with DialoGPT | Jun 23, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| Treating Dialogue Quality Evaluation as an Anomaly Detection Problem | May 1, 2020 | Anomaly DetectionDialogue Evaluation | —Unverified | 0 |
| Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers | May 1, 2020 | Dialogue Evaluation | —Unverified | 0 |