| Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents | Jan 12, 2022 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation | Dec 14, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| User Response and Sentiment Prediction for Automatic Dialogue Evaluation | Nov 16, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| Automatic Evaluation and Moderation of Open-domain Dialogue Systems | Nov 3, 2021 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 |
| Proxy Indicators for the Quality of Open-domain Dialogues | Nov 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Investigating the Impact of Pre-trained Language Models on Dialog Evaluation | Oct 5, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Sep 17, 2021 | Dialogue Evaluation | —Unverified | 0 |
| Enhancing the Open-Domain Dialogue Evaluation in Latent Space | Aug 1, 2021 | Dialogue Evaluation | —Unverified | 0 |
| A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues | Aug 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |