| SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation | Aug 17, 2022 | Contrastive LearningDialogue Evaluation | CodeCode Available | 0 | 5 |
| Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation | Aug 31, 2023 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| SLIDE: A Framework Integrating Small and Large Language Models for Open-Domain Dialogues Evaluation | May 24, 2024 | Contrastive LearningDialogue Evaluation | CodeCode Available | 0 | 5 |
| Soda-Eval: Open-Domain Dialogue Evaluation in the age of LLMs | Aug 20, 2024 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Emphasising Structured Information: Integrating Abstract Meaning Representation into LLMs for Enhanced Open-Domain Dialogue Evaluation | Apr 1, 2024 | Abstract Meaning RepresentationDialogue Evaluation | CodeCode Available | 0 | 5 |
| Synthesizing Adversarial Negative Responses for Robust Response Ranking and Evaluation | Jun 10, 2021 | Binary ClassificationDialogue Evaluation | CodeCode Available | 0 | 5 |
| Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses | Aug 23, 2017 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| Towards Multilingual Automatic Dialogue Evaluation | Aug 31, 2023 | Dialogue EvaluationMachine Translation | CodeCode Available | 0 | 5 |
| Transformers for Headline Selection for Russian News Clusters | Jun 19, 2021 | Dialogue EvaluationSentence | CodeCode Available | 0 | 5 |
| What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation | Mar 25, 2022 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 0 | 5 |
| Towards Best Experiment Design for Evaluating Dialogue System Output | Sep 23, 2019 | Dialogue Evaluation | CodeCode Available | 0 | 5 |
| MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue | Jun 19, 2022 | Dialogue EvaluationMME | —Unverified | 0 | 0 |
| One "Ruler" for All Languages: Multi-Lingual Dialogue Evaluation with Adversarial Multi-Task Learning | May 8, 2018 | AllDialogue Evaluation | —Unverified | 0 | 0 |
| On the Benchmarking of LLMs for Open-Domain Dialogue Evaluation | Jul 4, 2024 | BenchmarkingChatbot | —Unverified | 0 | 0 |
| U-NEED: A Fine-grained Dataset for User Needs-Centric E-commerce Conversational Recommendation | May 5, 2023 | Conversational RecommendationDialogue Evaluation | —Unverified | 0 | 0 |
| PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment | Dec 18, 2022 | Data AugmentationDialogue Evaluation | —Unverified | 0 | 0 |
| Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations | Sep 3, 2024 | Dialogue Evaluation | —Unverified | 0 | 0 |
| Pragmatically Appropriate Diversity for Dialogue Evaluation | Apr 6, 2023 | Dialogue EvaluationDiversity | —Unverified | 0 | 0 |
| Predicting Ratings of Real Dialogue Participants from Artificial Data and Ratings of Human Dialogue Observers | May 1, 2020 | Dialogue Evaluation | —Unverified | 0 | 0 |
| ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | Sep 6, 2019 | Dialogue Evaluation | —Unverified | 0 | 0 |
| User Response and Sentiment Prediction for Automatic Dialogue Evaluation | Nov 16, 2021 | Dialogue EvaluationOpen-Domain Dialog | —Unverified | 0 | 0 |
| Dialogue Evaluation with Offline Reinforcement Learning | Sep 2, 2022 | Dialogue EvaluationOffline RL | —Unverified | 0 | 0 |
| RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue | Sep 15, 2023 | Dialogue EvaluationMulti-Task Learning | —Unverified | 0 | 0 |
| Re-evaluating ADEM: A Deeper Look at Scoring Dialogue Responses | Feb 23, 2019 | Dialogue EvaluationResponse Generation | —Unverified | 0 | 0 |
| Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges | Mar 18, 2022 | Dialogue Evaluation | —Unverified | 0 | 0 |