| PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit | May 20, 2022 | AllAutomatic Speech Recognition (ASR) | CodeCode Available | 6 |
| High-Fidelity Simultaneous Speech-To-Speech Translation | Feb 5, 2025 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 5 |
| OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia | Jan 23, 2025 | Emotion RecognitionEvent Detection | CodeCode Available | 3 |
| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Speech Translation | Jun 11, 2024 | DecoderSimultaneous Speech-to-Speech Translation | CodeCode Available | 2 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 |
| SeamlessM4T: Massively Multilingual & Multimodal Machine Translation | Aug 22, 2023 | Automatic Speech RecognitionMachine Translation | CodeCode Available | 2 |
| SONAR: Sentence-Level Multimodal and Language-Agnostic Representations | Aug 22, 2023 | DecoderMachine Translation | CodeCode Available | 2 |
| MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | Mar 1, 2023 | Audio-Visual Speech RecognitionRobust Speech Recognition | CodeCode Available | 2 |
| CVSS Corpus and Massively Multilingual Speech-to-Speech Translation | Jan 11, 2022 | SentenceSpeech-to-Speech Translation | CodeCode Available | 2 |
| Speech Model Pre-training for End-to-End Spoken Language Understanding | Apr 7, 2019 | Speech-to-TextSpoken Language Understanding | CodeCode Available | 2 |
| Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework | May 24, 2025 | Adversarial AttackSpeech Tokenization | CodeCode Available | 1 |
| Enhancing Speech-to-Speech Dialogue Modeling with End-to-End Retrieval-Augmented Generation | Apr 27, 2025 | RAGRetrieval | CodeCode Available | 1 |
| MEDIBENG WHISPER TINY: A FINE-TUNED CODE-SWITCHED BENGALI-ENGLISH TRANSLATOR FOR CLINICAL APPLICATIONS | Apr 25, 2025 | Clinical Language TranslationMachine Translation | CodeCode Available | 1 |
| DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities | Feb 16, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| WhiSPA: Semantically and Psychologically Aligned Whisper with Self-Supervised Contrastive and Student-Teacher Learning | Jan 15, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Fine-tuning Whisper on Low-Resource Languages for Real-World Applications | Dec 20, 2024 | FormSentence | CodeCode Available | 1 |
| STTATTS: Unified Speech-To-Text And Text-To-Speech Model | Oct 24, 2024 | Multi-Task Learningspeech-recognition | CodeCode Available | 1 |
| Denial-of-Service Poisoning Attacks against Large Language Models | Oct 14, 2024 | 16kSpeech-to-Text | CodeCode Available | 1 |
| OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents | Aug 6, 2024 | BenchmarkingRetrieval-augmented Generation | CodeCode Available | 1 |
| LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models | Jul 22, 2024 | Data AugmentationLanguage Modeling | CodeCode Available | 1 |
| Indoor Air Quality Dataset with Activities of Daily Living in Low to Middle-income Communities | Jul 19, 2024 | ImputationRecommendation Systems | CodeCode Available | 1 |
| ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Jun 26, 2024 | ArzEn Code-switched Translation to araArzEn Code-switched Translation to eng | CodeCode Available | 1 |
| Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet | Jun 25, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Revisiting Interpolation Augmentation for Speech-to-Text Generation | Jun 22, 2024 | Speech-to-TextText Generation | CodeCode Available | 1 |