| USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation | May 1, 2020 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| Learning an Unreferenced Metric for Online Dialogue Evaluation | May 1, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems | Apr 6, 2020 | Dialogue Evaluation | CodeCode Available | 1 |
| RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems | Jan 11, 2017 | Dialogue EvaluationOpen-Domain Dialog | CodeCode Available | 1 |
| DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation | Jun 4, 2025 | Dialogue Evaluationvalid | —Unverified | 0 |
| MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators | May 28, 2025 | BenchmarkingChatbot | CodeCode Available | 0 |
| MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation | May 27, 2025 | Dialogue Evaluation | —Unverified | 0 |
| LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation | May 26, 2025 | Dialogue Evaluation | —Unverified | 0 |
| Methods for Recognizing Nested Terms | Apr 22, 2025 | Dialogue Evaluationnamed-entity-recognition | CodeCode Available | 0 |
| RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts | Apr 9, 2025 | Dialogue EvaluationLanguage Modeling | CodeCode Available | 0 |