| CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training | May 23, 2025 | Automatic Speech RecognitionEmotion Recognition | CodeCode Available | 11 | 5 |
| GLM-4-Voice: Towards Intelligent and Human-Like End-to-End Spoken Chatbot | Dec 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 | 5 |
| Scaling Speech-Text Pre-training with Synthetic Interleaved Data | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 7 | 5 |
| OxfordVGG Submission to the EGO4D AV Transcription Challenge | Jul 18, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 6 | 5 |
| FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration | Jan 24, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 5 | 5 |
| VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model | May 6, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 | 5 |
| Dolphin: A Large-Scale Automatic Speech Recognition Model for Eastern Languages | Mar 26, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 | 5 |
| GigaAM: Efficient Self-Supervised Learner for Speech Recognition | Jun 1, 2025 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 4 | 5 |
| SpeechColab Leaderboard: An Open-Source Platform for Automatic Speech Recognition Evaluation | Mar 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 4 | 5 |
| SALMONN: Towards Generic Hearing Abilities for Large Language Models | Oct 20, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 3 | 5 |
| Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| A Parallelizable Lattice Rescoring Strategy with Neural Language Models | Mar 8, 2021 | ARCAutomatic Speech Recognition | CodeCode Available | 3 | 5 |
| Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play | May 5, 2025 | AI AgentAutomatic Speech Recognition | CodeCode Available | 3 | 5 |
| PhoWhisper: Automatic Speech Recognition for Vietnamese | Mar 27, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 3 | 5 |
| Sentiment Reasoning for Healthcare | Jul 24, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates | Sep 27, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| DiarizationLM: Speaker Diarization Post-Processing with Large Language Models | Jan 7, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Delay-penalized transducer for low-latency streaming ASR | Oct 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| MooER: LLM-based Speech Recognition and Translation Models from Moore Threads | Aug 9, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| WhisperNER: Unified Open Named Entity and Speech Recognition | Sep 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation | May 12, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| VoiceBench: Benchmarking LLM-Based Voice Assistants | Oct 22, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Conformer: Convolution-augmented Transformer for Speech Recognition | May 16, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| SeamlessM4T: Massively Multilingual & Multimodal Machine Translation | Aug 22, 2023 | Automatic Speech RecognitionMachine Translation | CodeCode Available | 2 | 5 |
| Recent Advances in Speech Language Models: A Survey | Oct 1, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning | Jun 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| 4-bit Conformer with Native Quantization Aware Training for Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Robust Self-Supervised Audio-Visual Speech Recognition | Jan 5, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension | Feb 12, 2024 | 2kAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units | Jul 5, 2024 | Acoustic Unit DiscoveryAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages | Oct 1, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 | 5 |
| NusaCrowd: Open Source Initiative for Indonesian NLP Resources | Dec 19, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings | Mar 4, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| PromptASR for contextualized ASR with controllable style | Sep 14, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 | 5 |
| Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition | May 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Large Language Models are Strong Audio-Visual Speech Recognition Learners | Sep 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models | Oct 4, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec | Sep 14, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 | 5 |
| Fast Transformers with Clustered Attention | Jul 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation | Feb 27, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Dialectal Coverage And Generalization in Arabic Speech Recognition | Nov 7, 2024 | Arabic Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition | Dec 30, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions | Sep 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Large Language Models are Efficient Learners of Noise-Robust Speech Recognition | Jan 19, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | Jan 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| An Embarrassingly Simple Approach for LLM with Strong ASR Capacity | Feb 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement | Sep 22, 2022 | Audio Super-ResolutionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric | Dec 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | Mar 25, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |