| Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References | Jul 24, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 | 5 |
| PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison | Apr 1, 2024 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 | 5 |
| Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems | Jun 21, 2019 | Dialogue EvaluationKnowledge Distillation | CodeCode Available | 0 | 5 |
| DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations | Mar 18, 2022 | Abstract Meaning RepresentationCoherence Evaluation | CodeCode Available | 0 | 5 |
| An Adversarially-Learned Turing Test for Dialog Generation Models | Apr 16, 2021 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Towards Best Experiment Design for Evaluating Dialogue System Output | Sep 23, 2019 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Generating Negative Samples by Manipulating Golden Responses for Unsupervised Learning of a Response Evaluation Model | Jun 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 | 5 |
| C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation | Jun 27, 2023 | Dialogue Evaluation | CodeCode Available | 0 | 5 |