W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training Aug 7, 2021 Contrastive Learning Language Modeling
Code Code Available 35 GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement Jun 17, 2024 speech-recognition Speech Recognition
Code Code Available 35 wav2letter++: The Fastest Open-source Speech Recognition System Dec 18, 2018 Speech Recognition
Code Code Available 35 WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Oct 26, 2021 Denoising Self-Supervised Learning
Code Code Available 35 Datasets: A Community Library for Natural Language Processing Sep 7, 2021 Image Classification Object Recognition
Code Code Available 35 Delay-penalized transducer for low-latency streaming ASR Oct 31, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia Jan 23, 2025 Emotion Recognition Event Detection
Code Code Available 35 A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 35 DiarizationLM: Speaker Diarization Post-Processing with Large Language Models Jan 7, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 VoiceBench: Benchmarking LLM-Based Voice Assistants Oct 22, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 35 SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization Jun 18, 2024 Landmark-based Lipreading Lipreading
Code Code Available 25 A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment Mar 8, 2025 speech-recognition Speech Recognition
Code Code Available 25 Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Dec 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Stabilizing Transformer Training by Preventing Attention Entropy Collapse Mar 11, 2023 Automatic Speech Recognition image-classification
Code Code Available 25 TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection Mar 31, 2025 Fraud Detection Large Language Model
Code Code Available 25 An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Feb 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Jul 13, 2024 Mamba speech-recognition
Code Code Available 25 BLSP-Emo: Towards Empathetic Large Speech-Language Models Jun 6, 2024 Emotion Recognition Instruction Following
Code Code Available 25 BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Apr 2, 2024 speech-recognition Speech Recognition
Code Code Available 25 Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Jun 2, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 TEVR: Improving Speech Recognition by Token Entropy Variance Reduction Jun 25, 2022 Automatic Speech Recognition (ASR) Language Modeling
Code Code Available 25 Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Jun 14, 2024 Decoder speech-recognition
Code Code Available 25 SoundSpaces 2.0: A Simulation Platform for Visual-Acoustic Learning Jun 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Recent Advances in Speech Language Models: A Survey Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Robust Self-Supervised Audio-Visual Speech Recognition Jan 5, 2022 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Mar 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units Jul 5, 2024 Acoustic Unit Discovery Automatic Speech Recognition
Code Code Available 25 AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Feb 12, 2024 2k Automatic Speech Recognition
Code Code Available 25 PromptASR for contextualized ASR with controllable style Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 25 MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation Mar 1, 2023 Audio-Visual Speech Recognition Robust Speech Recognition
Code Code Available 25 MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Oct 1, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 25 Mamba in Speech: Towards an Alternative to Self-Attention May 21, 2024 Mamba Speech Enhancement
Code Code Available 25 Liquid Structural State-Space Models Sep 26, 2022 Heart rate estimation Long-range modeling
Code Code Available 25 Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels Mar 25, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement Jul 1, 2025 Automatic Speech Recognition Mamba
Code Code Available 25 Automated Deep Learning: Neural Architecture Search Is Not the End Dec 16, 2021 Deep Learning Machine Translation
Code Code Available 25 Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Apr 24, 2024 Drug Design Inductive Bias
Code Code Available 25 A New Frontier of AI: On-Device AI Training and Personalization Jun 9, 2022 Efficient Neural Network speech-recognition
Code Code Available 25 audino: A Modern Annotation Tool for Audio and Speech Jun 9, 2020 Action Detection Activity Detection
Code Code Available 25 4-bit Conformer with Native Quantization Aware Training for Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 25 Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Jan 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention Feb 22, 2024 Image Inpainting speech-recognition
Code Code Available 25