| SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words | Jun 19, 2024 | Dialogue Understanding | CodeCode Available | 2 |
| PromptCBLUE: A Chinese Prompt Tuning Benchmark for the Medical Domain | Oct 22, 2023 | Dialogue GenerationDialogue Understanding | CodeCode Available | 2 |
| Utterance-level Dialogue Understanding: An Empirical Study | Sep 29, 2020 | Dialogue UnderstandingGoal-Oriented Dialogue Systems | CodeCode Available | 2 |
| DialSim: A Real-Time Simulator for Evaluating Long-Term Multi-Party Dialogue Understanding of Conversational Agents | Jun 19, 2024 | Dialogue UnderstandingQuestion Answering | CodeCode Available | 1 |
| MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | May 29, 2024 | BenchmarkingDialogue Understanding | CodeCode Available | 1 |
| Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task | Oct 10, 2023 | Data AugmentationDialogue Understanding | CodeCode Available | 1 |
| VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions | May 30, 2023 | Dialogue GenerationDialogue Understanding | CodeCode Available | 1 |
| Medical Dialogue Generation via Dual Flow Modeling | May 29, 2023 | Dialogue GenerationDialogue Understanding | CodeCode Available | 1 |
| Hierarchical Dialogue Understanding with Special Tokens and Turn-level Attention | Apr 29, 2023 | Dialogue Act ClassificationDialogue Understanding | CodeCode Available | 1 |
| ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications | Nov 8, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |