| SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words | Jun 19, 2024 | Dialogue Understanding | CodeCode Available | 2 |
| DialSim: A Real-Time Simulator for Evaluating Long-Term Multi-Party Dialogue Understanding of Conversational Agents | Jun 19, 2024 | Dialogue UnderstandingQuestion Answering | CodeCode Available | 1 |
| Investigating Low-Cost LLM Annotation for~Spoken Dialogue Understanding Datasets | Jun 19, 2024 | Dialogue UnderstandingLanguage Modeling | —Unverified | 0 |
| Item-Language Model for Conversational Recommendation | Jun 5, 2024 | Conversational RecommendationDialogue Understanding | —Unverified | 0 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| Long-Horizon Dialogue Understanding for Role Identification in the Game of Avalon with Large Language Models | Nov 9, 2023 | Decision MakingDialogue Understanding | —Unverified | 0 |
| PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain | Oct 22, 2023 | Dialogue GenerationDialogue Understanding | CodeCode Available | 2 |
| From Multilingual Complexity to Emotional Clarity: Leveraging Commonsense to Unveil Emotions in Code-Mixed Dialogues | Oct 19, 2023 | Dialogue UnderstandingEmotional Intelligence | CodeCode Available | 0 |
| Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task | Oct 10, 2023 | Data AugmentationDialogue Understanding | CodeCode Available | 1 |
| Self-Explanation Prompting Improves Dialogue Understanding in Large Language Models | Sep 22, 2023 | Dialogue Understanding | —Unverified | 0 |