| Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents | Jan 12, 2022 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation | Dec 14, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| User Response and Sentiment Prediction for Automatic Dialogue Evaluation | Nov 16, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| Automatic Evaluation and Moderation of Open-domain Dialogue Systems | Nov 3, 2021 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 |
| Proxy Indicators for the Quality of Open-domain Dialogues | Nov 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Investigating the Impact of Pre-trained Language Models on Dialog Evaluation | Oct 5, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Sep 17, 2021 | Dialogue Evaluation | —Unverified | 0 |
| Enhancing the Open-Domain Dialogue Evaluation in Latent Space | Aug 1, 2021 | Dialogue Evaluation | —Unverified | 0 |
| A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues | Aug 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Transformers for Headline Selection for Russian News Clusters | Jun 19, 2021 | Dialogue EvaluationSentence | CodeCode Available | 0 |
| Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation | Jun 10, 2021 | Binary ClassificationDialogue Evaluation | CodeCode Available | 0 |
| A Comprehensive Assessment of Dialog Evaluation Metrics | Jun 7, 2021 | Dialogue EvaluationResponse Generation | CodeCode Available | 1 |
| Improving Automated Evaluation of Open Domain Dialog via Diverse Reference Augmentation | Jun 5, 2021 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 0 |
| Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances | Jun 4, 2021 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| DynaEval: Unifying Turn and Dialogue Level Evaluation | Jun 2, 2021 | Dialogue Evaluation | CodeCode Available | 1 |
| Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model | Jun 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Towards Quantifiable Dialogue Coherence Evaluation | Jun 1, 2021 | Coherence EvaluationDialogue Evaluation | CodeCode Available | 1 |
| Assessing Dialogue Systems with Distribution Distances | May 6, 2021 | Dialogue Evaluation | CodeCode Available | 1 |
| DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels | Apr 18, 2021 | Dialogue EvaluationMachine Translation | —Unverified | 0 |
| Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering | Apr 16, 2021 | Abstractive Text SummarizationDialogue Evaluation | CodeCode Available | 1 |
| An Adversarially-Learned Turing Test for Dialog Generation Models | Apr 16, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track | Jan 20, 2021 | Dialogue EvaluationLanguage Modeling | —Unverified | 0 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 |
| GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems | Oct 8, 2020 | Dialogue Evaluation | CodeCode Available | 1 |