| DRE: An Effective Dual-Refined Method for Integrating Small and Large Language Models in Open-Domain Dialogue Evaluation | Jun 4, 2025 | Dialogue Evaluationvalid | —Unverified | 0 |
| MEDAL: A Framework for Benchmarking LLMs as Multilingual Open-Domain Chatbots and Dialogue Evaluators | May 28, 2025 | BenchmarkingChatbot | CodeCode Available | 0 |
| MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation | May 27, 2025 | Dialogue Evaluation | —Unverified | 0 |
| LeCoDe: A Benchmark Dataset for Interactive Legal Consultation Dialogue Evaluation | May 26, 2025 | Dialogue Evaluation | —Unverified | 0 |
| Methods for Recognizing Nested Terms | Apr 22, 2025 | Dialogue Evaluationnamed-entity-recognition | CodeCode Available | 0 |
| RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts | Apr 9, 2025 | Dialogue EvaluationLanguage Modeling | CodeCode Available | 0 |
| Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | Apr 7, 2025 | Dialogue EvaluationFairness | CodeCode Available | 2 |
| BoK: Introducing Bag-of-Keywords Loss for Interpretable Dialogue Response Generation | Jan 17, 2025 | DecoderDialogue Evaluation | CodeCode Available | 0 |
| Measuring the Robustness of Reference-Free Dialogue Evaluation Systems | Jan 12, 2025 | Dialogue EvaluationTAG | CodeCode Available | 0 |
| Dialogue You Can Trust: Human and AI Perspectives on Generated Conversations | Sep 3, 2024 | Dialogue Evaluation | —Unverified | 0 |