| Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Apr 7, 2025 | Dialogue EvaluationFairness | CodeCode Available | 2 |
| USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation | May 1, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems | Oct 8, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| Q^2: Evaluating Factual Consistency in Knowledge-Grounded Dialogues via Question Generation and Question Answering | Apr 16, 2021 | Abstractive Text SummarizationDialogue Evaluation | CodeCode Available | 1 |
| Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation | Jul 1, 2020 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| Unsupervised Evaluation of Interactive Dialog with DialoGPT | Jun 23, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning | May 25, 2022 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining | Sep 23, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| Learning an Unreferenced Metric for Online Dialogue Evaluation | May 1, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems | Apr 6, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation | Feb 28, 2023 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| RuNNE-2022 Shared Task: Recognizing Nested Named Entities | May 23, 2022 | Dialogue Evaluationnamed-entity-recognition | CodeCode Available | 1 |
| Automatic Evaluation and Moderation of Open-domain Dialogue Systems | Nov 3, 2021 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| Towards Quantifiable Dialogue Coherence Evaluation | Jun 1, 2021 | Coherence EvaluationDialogue Evaluation | CodeCode Available | 1 |
| RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems | Jan 11, 2017 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| DynaEval: Unifying Turn and Dialogue Level Evaluation | Jun 2, 2021 | Dialogue Evaluation | CodeCode Available | 1 |
| Conversations Are Not Flat: Modeling the Dynamic Information Flow across Dialogue Utterances | Jun 4, 2021 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems | Dec 18, 2022 | ChatbotDialogue Evaluation | CodeCode Available | 1 |
| A Comprehensive Assessment of Dialog Evaluation Metrics | Jun 7, 2021 | Dialogue EvaluationResponse Generation | CodeCode Available | 1 |
| FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation | Oct 25, 2022 | Dialogue Evaluation | CodeCode Available | 1 |
| Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian | Jun 3, 2022 | Binary ClassificationDialogue Evaluation | CodeCode Available | 1 |
| DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation | May 8, 2023 | Contrastive LearningDensity Estimation | CodeCode Available | 1 |
| DialogBench: Evaluating LLMs as Human-like Dialogue Systems | Nov 3, 2023 | Dialogue Evaluation | CodeCode Available | 1 |
| Assessing Dialogue Systems with Distribution Distances | May 6, 2021 | Dialogue Evaluation | CodeCode Available | 1 |
| Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings | Apr 24, 2019 | Dialogue Evaluationvalid | —Unverified | 0 |