| Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation | Jul 1, 2020 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 | 5 |
| DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation | May 8, 2023 | Contrastive LearningDensity Estimation | CodeCode Available | 1 | 5 |
| DialogBench: Evaluating LLMs as Human-like Dialogue Systems | Nov 3, 2023 | Dialogue Evaluation | CodeCode Available | 1 | 5 |
| Assessing Dialogue Systems with Distribution Distances | May 6, 2021 | Dialogue Evaluation | CodeCode Available | 1 | 5 |
| Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model | Jun 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 | 5 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Mar 11, 2022 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation | Sep 14, 2023 | ChatbotDialogue Evaluation | CodeCode Available | 0 | 5 |
| ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues | Jul 16, 2024 | Coherence EvaluationDialogue Evaluation | CodeCode Available | 0 | 5 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 | 5 |