| Leveraging LLMs for Dialogue Quality Measurement | Jun 25, 2024 | Dialogue Evaluation | —Unverified | 0 | 0 |
| LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation | Jun 5, 2024 | Dialogue EvaluationSensitivity | —Unverified | 0 | 0 |
| MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation | May 27, 2025 | Dialogue Evaluation | —Unverified | 0 | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Sep 17, 2021 | Dialogue Evaluation | —Unverified | 0 | 0 |
| AdaCoach: A Virtual Coach for Training Customer Service Agents | Apr 27, 2022 | Dialogue Evaluation | —Unverified | 0 | 0 |
| WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track | Jan 20, 2021 | Dialogue EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| Treating Dialogue Quality Evaluation as an Anomaly Detection Problem | May 1, 2020 | Anomaly DetectionDialogue Evaluation | —Unverified | 0 | 0 |