| Enhancing Speech Large Language Models with Prompt-Aware Mixture of Audio Encoders | Feb 21, 2025 | Audio captioningAutomatic Speech Recognition | —Unverified | 0 |
| WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models | Feb 20, 2025 | Automatic Speech RecognitionRAG | —Unverified | 0 |
| Measuring the Effect of Transcription Noise on Downstream Language Understanding Tasks | Feb 19, 2025 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| Adopting Whisper for Confidence Estimation | Feb 19, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Gesture-Aware Zero-Shot Speech Recognition for Patients with Language Disorders | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Lost in Transcription, Found in Distribution Shift: Demystifying Hallucination in Speech Foundation Models | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Speech-FT: Merging Pre-trained And Fine-Tuned Speech Representation Models For Cross-Task Generalization | Feb 18, 2025 | Automatic Speech RecognitionSpeaker Identification | —Unverified | 0 |
| Benchmarking Automatic Speech Recognition coupled LLM Modules for Medical Diagnostics | Feb 18, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Microphone Array Geometry Independent Multi-Talker Distant ASR: NTT System for the DASR Task of the CHiME-8 Challenge | Feb 14, 2025 | Action DetectionActivity Detection | —Unverified | 0 |
| MTLM: Incorporating Bidirectional Text Information to Enhance Language Model Training in Speech Recognition Systems | Feb 14, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |