| BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation | Jan 17, 2025 | DecoderDialogue Evaluation | CodeCode Available | 0 |
| Measuring the Robustness of Reference-Free Dialogue Evaluation Systems | Jan 12, 2025 | Dialogue EvaluationTAG | CodeCode Available | 0 |
| Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations | Sep 3, 2024 | Dialogue Evaluation | —Unverified | 0 |
| Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs | Aug 20, 2024 | Dialogue Evaluation | CodeCode Available | 0 |
| ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues | Jul 16, 2024 | Coherence EvaluationDialogue Evaluation | CodeCode Available | 0 |
| On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation | Jul 4, 2024 | BenchmarkingChatbot | —Unverified | 0 |
| Leveraging LLMs for Dialogue Quality Measurement | Jun 25, 2024 | Dialogue Evaluation | —Unverified | 0 |
| LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation | Jun 5, 2024 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation | May 24, 2024 | Contrastive LearningDialogue Evaluation | CodeCode Available | 0 |
| PairEval: Open-domain Dialogue Evaluation with Pairwise Comparison | Apr 1, 2024 | Dialogue Evaluation | CodeCode Available | 0 |