| GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation | Feb 28, 2023 | Dialogue EvaluationDialogue Generation | CodeCode Available | 1 |
| DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation | May 8, 2023 | Contrastive LearningDensity Estimation | CodeCode Available | 1 |
| DialogBench: Evaluating LLMs as Human-like Dialogue Systems | Nov 3, 2023 | Dialogue Evaluation | CodeCode Available | 1 |
| Assessing Dialogue Systems with Distribution Distances | May 6, 2021 | Dialogue Evaluation | CodeCode Available | 1 |
| AdaCoach: A Virtual Coach for Training Customer Service Agents | Apr 27, 2022 | Dialogue Evaluation | —Unverified | 0 |
| DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation | Jun 4, 2025 | Dialogue Evaluationvalid | —Unverified | 0 |
| Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents | Jan 12, 2022 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations | Sep 3, 2024 | Dialogue Evaluation | —Unverified | 0 |
| Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings | Apr 24, 2019 | Dialogue Evaluationvalid | —Unverified | 0 |
| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |