WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Oct 26, 2021 Denoising Self-Supervised Learning
Code Code Available 3Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates Sep 27, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Datasets: A Community Library for Natural Language Processing Sep 7, 2021 Image Classification Object Recognition
Code Code Available 3W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training Aug 7, 2021 Contrastive Learning Language Modeling
Code Code Available 3A Parallelizable Lattice Rescoring Strategy with Neural Language Models Mar 8, 2021 ARC Automatic Speech Recognition
Code Code Available 3WeNet: Production oriented Streaming and Non-streaming End-to-End Speech Recognition Toolkit Feb 2, 2021 Decoder speech-recognition
Code Code Available 3wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Jun 20, 2020 Quantization Self-Supervised Learning
Code Code Available 3Conformer: Convolution-augmented Transformer for Speech Recognition May 16, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3Semi-Supervised Speech Recognition via Local Prior Matching Feb 24, 2020 Knowledge Distillation Language Modeling
Code Code Available 3wav2letter++: The Fastest Open-source Speech Recognition System Dec 18, 2018 Speech Recognition
Code Code Available 3TED-LIUM 3: twice as much data and corpus repartition for experiments on speaker adaptation May 12, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 3MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement Jul 1, 2025 Automatic Speech Recognition Mamba
Code Code Available 2CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization May 6, 2025 Active Speaker Detection Audio-Visual Speech Recognition
Code Code Available 2TeleAntiFraud-28k: An Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection Mar 31, 2025 Fraud Detection Large Language Model
Code Code Available 2A Noise-Robust Turn-Taking System for Real-World Dialogue Robots: A Field Experiment Mar 8, 2025 speech-recognition Speech Recognition
Code Code Available 2LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition Dec 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency Dec 17, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Dialectal Coverage And Generalization in Arabic Speech Recognition Nov 7, 2024 Arabic Speech Recognition Automatic Speech Recognition
Code Code Available 2emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography Oct 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Recent Advances in Speech Language Models: A Survey Oct 1, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Oct 1, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 2Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2wav2graph: A Framework for Supervised Learning Knowledge Graph from Speech Aug 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Jul 13, 2024 Mamba speech-recognition
Code Code Available 2Pretraining End-to-End Keyword Search with Automatically Discovered Acoustic Units Jul 5, 2024 Acoustic Unit Discovery Automatic Speech Recognition
Code Code Available 2SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization Jun 18, 2024 Landmark-based Lipreading Lipreading
Code Code Available 2Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection Jun 14, 2024 Decoder speech-recognition
Code Code Available 2BLSP-Emo: Towards Empathetic Large Speech-Language Models Jun 6, 2024 Emotion Recognition Instruction Following
Code Code Available 2TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation May 28, 2024 Machine Translation speech-recognition
Code Code Available 2Let's Fuse Step by Step: A Generative Fusion Decoding Algorithm with LLMs for Multi-modal Text Recognition May 23, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information May 21, 2024 Speech Recognition
Code Code Available 2Mamba in Speech: Towards an Alternative to Self-Attention May 21, 2024 Mamba Speech Enhancement
Code Code Available 2Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Apr 24, 2024 Drug Design Inductive Bias
Code Code Available 2BRAVEn: Improving Self-Supervised Pre-training for Visual and Auditory Speech Recognition Apr 2, 2024 speech-recognition Speech Recognition
Code Code Available 2PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings Mar 4, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2HINT: High-quality INPainting Transformer with Mask-Aware Encoding and Enhanced Attention Feb 22, 2024 Image Inpainting speech-recognition
Code Code Available 2An Embarrassingly Simple Approach for LLM with Strong ASR Capacity Feb 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension Feb 12, 2024 2k Automatic Speech Recognition
Code Code Available 2Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation Feb 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Large Language Models are Efficient Learners of Noise-Robust Speech Recognition Jan 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition Oct 10, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 2PromptASR for contextualized ASR with controllable style Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels Mar 25, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 2Stabilizing Transformer Training by Preventing Attention Entropy Collapse Mar 11, 2023 Automatic Speech Recognition image-classification
Code Code Available 2