| How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not | Sep 25, 2024 | Automatic Speech Recognitionspeech-recognition | —Unverified | 0 |
| Toward Automated Clinical Transcriptions | Sep 20, 2024 | Speech-to-Text | —Unverified | 0 |
| On the Feasibility of Fully AI-automated Vishing Attacks | Sep 20, 2024 | Large Language ModelSpeech-to-Text | —Unverified | 0 |
| Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text | Sep 17, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Optimizing Rare Word Accuracy in Direct Speech Translation with a Retrieval-and-Demonstration Approach | Sep 13, 2024 | In-Context LearningRetrieval | CodeCode Available | 0 |
| Evaluation of real-time transcriptions using end-to-end ASR models | Sep 9, 2024 | Action DetectionActivity Detection | —Unverified | 0 |
| LAST: Language Model Aware Speech Tokenization | Sep 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AI-Based IVR | Aug 20, 2024 | Speech SynthesisSpeech-to-Text | —Unverified | 0 |
| CMU's IWSLT 2024 Simultaneous Speech Translation System | Aug 14, 2024 | DecoderSpeech-to-Text | —Unverified | 0 |
| OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | Aug 6, 2024 | BenchmarkingRetrieval-augmented Generation | CodeCode Available | 1 |