| NusaCrowd: Open Source Initiative for Indonesian NLP Resources | Dec 19, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric | Dec 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Towards A Unified Conformer Structure: from ASR to ASV Task | Nov 14, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement | Sep 22, 2022 | Audio Super-ResolutionAutomatic Speech Recognition | CodeCode Available | 2 |
| SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning | Jun 16, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Squeezeformer: An Efficient Transformer for Automatic Speech Recognition | Jun 2, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| 4-bit Conformer with Native Quantization Aware Training for Speech Recognition | Mar 29, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| CMGAN: Conformer-based Metric GAN for Speech Enhancement | Mar 28, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | Jan 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Robust Self-Supervised Audio-Visual Speech Recognition | Jan 5, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Fast Transformers with Clustered Attention | Jul 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities | May 23, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages | Mar 30, 2025 | Automatic Speech RecognitionLanguage Modeling | CodeCode Available | 1 |
| DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities | Feb 16, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification | Feb 11, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models | Feb 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language | Feb 1, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| FlanEC: Exploring Flan-T5 for Post-ASR Error Correction | Jan 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation | Jan 1, 2025 | Automatic Speech RecognitionDecoder | CodeCode Available | 1 |
| MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula | Dec 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection | Nov 15, 2024 | Audio Deepfake DetectionAutomatic Speech Recognition | CodeCode Available | 1 |
| Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention | Oct 19, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| VHASR: A Multimodal Speech Recognition System With Vision Hotwords | Oct 1, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Mamba for Streaming ASR Combined with Unimodal Aggregation | Sep 30, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SER Evals: In-domain and Out-of-domain Benchmarking for Speech Emotion Recognition | Aug 14, 2024 | Automatic Speech RecognitionBenchmarking | CodeCode Available | 1 |
| LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition | Aug 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features | Aug 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Evolutionary Prompt Design for LLM-Based Post-ASR Error Correction | Jul 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Framework for Curating Speech Datasets and Evaluating ASR Systems: A Case Study for Polish | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models | Jul 5, 2024 | Adversarial AttackAutomatic Speech Recognition | CodeCode Available | 1 |
| Improving Self-supervised Pre-training using Accent-Specific Codebooks | Jul 4, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models | Jul 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | Jun 26, 2024 | ArzEn Code-switched Translation to araArzEn Code-switched Translation to eng | CodeCode Available | 1 |
| Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model | Jun 25, 2024 | Automatic Lyrics TranscriptionAutomatic Speech Recognition | CodeCode Available | 1 |
| Automatic speech recognition for the Nepali language using CNN, bidirectional LSTM and ResNet | Jun 25, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech | Jun 16, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition | Jun 6, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition | May 27, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset | May 12, 2024 | Action SpottingAutomatic Speech Recognition | CodeCode Available | 1 |
| Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models | May 9, 2024 | Adversarial AttackAutomatic Speech Recognition | CodeCode Available | 1 |
| Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets | May 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Less Peaky and More Accurate CTC Forced Alignment by Label Priors | Apr 22, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Kallaama: A Transcribed Speech Dataset about Agriculture in the Three Most Widely Spoken Languages in Senegal | Apr 2, 2024 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Speech Robust Bench: A Robustness Benchmark For Speech Recognition | Mar 8, 2024 | Adversarial RobustnessAutomatic Speech Recognition | CodeCode Available | 1 |
| Language and Speech Technology for Central Kurdish Varieties | Mar 4, 2024 | Automatic Speech RecognitionDiversity | CodeCode Available | 1 |
| A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition | Mar 2, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition | Feb 8, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| REBORN: Reinforcement-Learned Boundary Segmentation with Iterative Training for Unsupervised ASR | Feb 6, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric | Jan 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |