| CodingTeachLLM: Empowering LLM's Coding Ability via AST Prior Knowledge | Mar 13, 2024 | Dialogue EvaluationHumanEval | —Unverified | 0 |
| Explaining Dialogue Evaluation Metrics using Adversarial Behavioral Analysis | Jul 1, 2022 | Dialogue Evaluation | —Unverified | 0 |
| Treating Dialogue Quality Evaluation as an Anomaly Detection Problem | May 1, 2020 | Anomaly DetectionDialogue Evaluation | —Unverified | 0 |
| U-NEED: A Fine-grained Dataset for User Needs-Centric E-commerce Conversational Recommendation | May 5, 2023 | Conversational RecommendationDialogue Evaluation | —Unverified | 0 |
| User Response and Sentiment Prediction for Automatic Dialogue Evaluation | Nov 16, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 |
| WeChat AI & ICT's Submission for DSTC9 Interactive Dialogue Evaluation Track | Jan 20, 2021 | Dialogue EvaluationLanguage Modeling | —Unverified | 0 |
| FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows | Feb 14, 2022 | Dialogue Evaluation | —Unverified | 0 |
| Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings | Apr 24, 2019 | Dialogue Evaluationvalid | —Unverified | 0 |
| How to Choose How to Choose Your Chatbot: A Massively Multi-System MultiReference Data Set for Dialog Metric Evaluation | May 23, 2023 | ChatbotDialogue Evaluation | —Unverified | 0 |
| How to Evaluate the Next System: Automatic Dialogue Evaluation from the Perspective of Continual Learning | Dec 10, 2019 | Continual LearningDialogue Evaluation | —Unverified | 0 |
| xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark | Oct 13, 2023 | Dialogue EvaluationMachine Translation | CodeCode Available | 0 |
| Achieving Reliable Human Assessment of Open-Domain Dialogue Systems | Mar 11, 2022 | Dialogue Evaluation | CodeCode Available | 0 |
| A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators | Dec 24, 2023 | Dialogue Evaluation | CodeCode Available | 0 |
| Adversarial Learning for Neural Dialogue Generation | Jan 23, 2017 | Dialogue EvaluationDialogue Generation | CodeCode Available | 0 |
| A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues | Aug 1, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| An Adversarially-Learned Turing Test for Dialog Generation Models | Apr 16, 2021 | Dialogue Evaluation | CodeCode Available | 0 |
| Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems | Jun 21, 2019 | Dialogue EvaluationKnowledge Distillation | CodeCode Available | 0 |
| BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation | Jan 17, 2025 | DecoderDialogue Evaluation | CodeCode Available | 0 |
| C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation | Jun 27, 2023 | Dialogue Evaluation | CodeCode Available | 0 |
| DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations | Mar 18, 2022 | Abstract Meaning RepresentationCoherence Evaluation | CodeCode Available | 0 |
| Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems | Nov 1, 2020 | Dialogue EvaluationSemantic Similarity | CodeCode Available | 0 |
| ECoh: Turn-level Coherence Evaluation for Multilingual Dialogues | Jul 16, 2024 | Coherence EvaluationDialogue Evaluation | CodeCode Available | 0 |
| Evaluating Coherence in Dialogue Systems using Entailment | Apr 6, 2019 | Dialogue EvaluationDiversity | CodeCode Available | 0 |
| Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation | Sep 14, 2023 | ChatbotDialogue Evaluation | CodeCode Available | 0 |
| GCDF1: A Goal- and Context- Driven F-Score for Evaluating User Models | Nov 1, 2021 | Dialogue EvaluationTask-Oriented Dialogue Systems | CodeCode Available | 0 |