| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| Dialogue Evaluation with Offline Reinforcement Learning | Sep 2, 2022 | Dialogue EvaluationOffline RL | —Unverified | 0 |
| ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | Sep 6, 2019 | Dialogue Evaluation | —Unverified | 0 |
| MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation | May 27, 2025 | Dialogue Evaluation | —Unverified | 0 |
| Learning the Human Judgment for the Automatic Evaluation of Chatbot | May 1, 2020 | ChatbotDialogue Evaluation | —Unverified | 0 |
| DCH-2: A Parallel Customer-Helpdesk Dialogue Corpus with Distributions of Annotators' Labels | Apr 18, 2021 | Dialogue EvaluationMachine Translation | —Unverified | 0 |
| Joint Goal Segmentation and Goal Success Prediction on Multi-Domain Conversations | Oct 1, 2022 | Dialogue EvaluationMulti-Task Learning | —Unverified | 0 |
| LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation | May 26, 2025 | Dialogue Evaluation | —Unverified | 0 |
| Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents | Jan 12, 2022 | Dialogue EvaluationSensitivity | —Unverified | 0 |
| Improving Open-Domain Dialogue Evaluation with a Causal Inference Model | Jan 31, 2023 | Causal Inferencecounterfactual | —Unverified | 0 |