MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation Mar 1, 2023 Audio-Visual Speech Recognition Robust Speech Recognition
Code Code Available 2NusaCrowd: Open Source Initiative for Indonesian NLP Resources Dec 19, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Towards A Unified Conformer Structure: from ASR to ASV Task Nov 14, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Liquid Structural State-Space Models Sep 26, 2022 Heart rate estimation Long-range modeling
Code Code Available 2CMGAN: Conformer-Based Metric-GAN for Monaural Speech Enhancement Sep 22, 2022 Audio Super-Resolution Automatic Speech Recognition
Code Code Available 2u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality Jul 14, 2022 Speaker Verification speech-recognition
Code Code Available 2TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Jun 25, 2022 Automatic Speech Recognition (ASR) Language Modeling
Code Code Available 2SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Jun 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2A New Frontier of AI: On-Device AI Training and Personalization Jun 9, 2022 Efficient Neural Network speech-recognition
Code Code Available 2Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Jun 2, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Vakyansh: ASR Toolkit for Low Resource Indic languages Mar 30, 2022 Punctuation Restoration speech-recognition
Code Code Available 24-bit Conformer with Native Quantization Aware Training for Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2CMGAN: Conformer-based Metric GAN for Speech Enhancement Mar 28, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2ICASSP 2022 Acoustic Echo Cancellation Challenge Feb 27, 2022 Acoustic echo cancellation Speech Enhancement
Code Code Available 2Visual Speech Recognition for Multiple Languages in the Wild Feb 26, 2022 Hyperparameter Optimization Lipreading
Code Code Available 2Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Jan 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Robust Self-Supervised Audio-Visual Speech Recognition Jan 5, 2022 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Automated Deep Learning: Neural Architecture Search Is Not the End Dec 16, 2021 Deep Learning Machine Translation
Code Code Available 2LightSeq2: Accelerated Training for Transformer-based Models on GPUs Oct 12, 2021 Decoder GPU
Code Code Available 2CrypTen: Secure Multi-Party Computation Meets Machine Learning Sep 2, 2021 BIG-bench Machine Learning GPU
Code Code Available 2VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition Sep 9, 2020 CPU speech-recognition
Code Code Available 2Fast Transformers with Clustered Attention Jul 9, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2audino: A Modern Annotation Tool for Audio and Speech Jun 9, 2020 Action Detection Activity Detection
Code Code Available 2VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking Oct 11, 2018 Speaker Recognition Speaker Separation
Code Code Available 2Training RNNs as Fast as CNNs Jan 1, 2018 General Classification Language Modeling
Code Code Available 2Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities May 23, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition May 22, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages Mar 30, 2025 Automatic Speech Recognition Language Modeling
Code Code Available 1MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens Mar 14, 2025 Audio-Visual Speech Recognition Computational Efficiency
Code Code Available 1Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings Mar 13, 2025 Speaker Identification speech-recognition
Code Code Available 1Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations Mar 8, 2025 Audio-Visual Speech Recognition Multi-Task Learning
Code Code Available 1DuplexMamba: Enhancing Real-time Speech Conversations with Duplex and Streaming Capabilities Feb 16, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1VINP: Variational Bayesian Inference with Neural Speech Prior for Joint ASR-Effective Speech Dereverberation and Blind RIR Identification Feb 11, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models Feb 9, 2025 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 1Sagalee: an Open Source Automatic Speech Recognition Dataset for Oromo Language Feb 1, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation Jan 23, 2025 Audio-Visual Speech Recognition Multi-Task Learning
Code Code Available 1FlanEC: Exploring Flan-T5 for Post-ASR Error Correction Jan 22, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation Jan 1, 2025 Automatic Speech Recognition Decoder
Code Code Available 1MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula Dec 20, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense Evaluation Nov 17, 2024 Action Recognition backdoor defense
Code Code Available 1XLSR-Mamba: A Dual-Column Bidirectional State Space Model for Spoofing Attack Detection Nov 15, 2024 Audio Deepfake Detection Automatic Speech Recognition
Code Code Available 1Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs Nov 4, 2024 Lipreading speech-recognition
Code Code Available 1STTATTS: Unified Speech-To-Text And Text-To-Speech Model Oct 24, 2024 Multi-Task Learning speech-recognition
Code Code Available 1VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning Oct 23, 2024 Question Answering Speech Recognition
Code Code Available 1AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition Oct 21, 2024 cross-modal alignment speech-recognition
Code Code Available 1Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention Oct 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning Oct 17, 2024 Representation Learning Self-Supervised Learning
Code Code Available 1VHASR: A Multimodal Speech Recognition System With Vision Hotwords Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1Mamba for Streaming ASR Combined with Unimodal Aggregation Sep 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1