wav2letter++: The Fastest Open-source Speech Recognition System Dec 18, 2018 Speech Recognition
Code Code Available 3WavChat: A Survey of Spoken Dialogue Models Nov 15, 2024 speech-recognition Speech Recognition
Code Code Available 3Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing Feb 23, 2024 Lipreading Lip Reading
Code Code Available 3Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Datasets: A Community Library for Natural Language Processing Sep 7, 2021 Image Classification Object Recognition
Code Code Available 3Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 3Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play May 5, 2025 AI Agent Automatic Speech Recognition
Code Code Available 3Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation Jun 14, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization Jun 18, 2024 Landmark-based Lipreading Lipreading
Code Code Available 2BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Apr 2, 2024 speech-recognition Speech Recognition
Code Code Available 2BLSP-Emo: Towards Empathetic Large Speech-Language Models Jun 6, 2024 Emotion Recognition Instruction Following
Code Code Available 2Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Dec 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection Mar 31, 2025 Fraud Detection Large Language Model
Code Code Available 2Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Jul 13, 2024 Mamba speech-recognition
Code Code Available 2BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Jun 2, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Stabilizing Transformer Training by Preventing Attention Entropy Collapse Mar 11, 2023 Automatic Speech Recognition image-classification
Code Code Available 2TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Jun 25, 2022 Automatic Speech Recognition (ASR) Language Modeling
Code Code Available 2Robust Self-Supervised Audio-Visual Speech Recognition Jan 5, 2022 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Jun 14, 2024 Decoder speech-recognition
Code Code Available 2Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units Jul 5, 2024 Acoustic Unit Discovery Automatic Speech Recognition
Code Code Available 2PromptASR for contextualized ASR with controllable style Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2Automated Deep Learning: Neural Architecture Search Is Not the End Dec 16, 2021 Deep Learning Machine Translation
Code Code Available 2NusaCrowd: Open Source Initiative for Indonesian NLP Resources Dec 19, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Mar 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Recent Advances in Speech Language Models: A Survey Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Jun 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Oct 1, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 2audino: A Modern Annotation Tool for Audio and Speech Jun 9, 2020 Action Detection Activity Detection
Code Code Available 2MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation Mar 1, 2023 Audio-Visual Speech Recognition Robust Speech Recognition
Code Code Available 2Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Apr 24, 2024 Drug Design Inductive Bias
Code Code Available 2LightSeq2: Accelerated Training for Transformer-based Models on GPUs Oct 12, 2021 Decoder GPU
Code Code Available 2LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Liquid Structural State-Space Models Sep 26, 2022 Heart rate estimation Long-range modeling
Code Code Available 2Mamba in Speech: Towards an Alternative to Self-Attention May 21, 2024 Mamba Speech Enhancement
Code Code Available 2u-HuBERT: Unified Mixed-Modal Speech Pretraining And Zero-Shot Transfer to Unlabeled Modality Jul 14, 2022 Speaker Verification speech-recognition
Code Code Available 2LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 24-bit Conformer with Native Quantization Aware Training for Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement Jul 1, 2025 Automatic Speech Recognition Mamba
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2Large Language Models are Efficient Learners of Noise-Robust Speech Recognition Jan 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Jan 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2ICASSP 2022 Acoustic Echo Cancellation Challenge Feb 27, 2022 Acoustic echo cancellation Speech Enhancement
Code Code Available 2HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention Feb 22, 2024 Image Inpainting speech-recognition
Code Code Available 2Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2