MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation Mar 1, 2023 Audio-Visual Speech Recognition Robust Speech Recognition
Code Code Available 25 Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation Feb 8, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 NusaCrowd: Open Source Initiative for Indonesian NLP Resources Dec 19, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement Jul 1, 2025 Automatic Speech Recognition Mamba
Code Code Available 25 BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric Dec 16, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Mamba in Speech: Towards an Alternative to Self-Attention May 21, 2024 Mamba Speech Enhancement
Code Code Available 25 LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Feb 27, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction Jan 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of end-to-end ASR Models Oct 4, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges Apr 24, 2024 Drug Design Inductive Bias
Code Code Available 25 MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages Oct 1, 2024 Automatic Speech Recognition speech-recognition
Code Code Available 25 Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions Sep 13, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 ICASSP 2022 Acoustic Echo Cancellation Challenge Feb 27, 2022 Acoustic echo cancellation Speech Enhancement
Code Code Available 25 Large Language Models are Efficient Learners of Noise-Robust Speech Recognition Jan 19, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 25 Fast Transformers with Clustered Attention Jul 9, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Large Language Models are Strong Audio-Visual Speech Recognition Learners Sep 18, 2024 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 25 LightSeq2: Accelerated Training for Transformer-based Models on GPUs Oct 12, 2021 Decoder GPU
Code Code Available 25 DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition Dec 30, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 Automated Deep Learning: Neural Architecture Search Is Not the End Dec 16, 2021 Deep Learning Machine Translation
Code Code Available 25 emg2qwerty: A Large Dataset with Baselines for Touch Typing using Surface Electromyography Oct 26, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 25 audino: A Modern Annotation Tool for Audio and Speech Jun 9, 2020 Action Detection Activity Detection
Code Code Available 25 Dialectal Coverage And Generalization in Arabic Speech Recognition Nov 7, 2024 Arabic Speech Recognition Automatic Speech Recognition
Code Code Available 25 CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization May 6, 2025 Active Speaker Detection Audio-Visual Speech Recognition
Code Code Available 25 FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information May 21, 2024 Speech Recognition
Code Code Available 25 LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 25 Attention-Based Models for Speech Recognition Jun 24, 2015 Machine Translation Phoneme Recognition
Code Code Available 15 Attention-based Contextual Language Model Adaptation for Speech Recognition Jun 2, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 CTC-synchronous Training for Monotonic Attention Model May 10, 2020 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Attention-based Audio-Visual Fusion for Robust Automatic Speech Recognition Sep 5, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Attack on practical speaker verification system using universal adversarial perturbations May 19, 2021 Real-World Adversarial Attack Room Impulse Response (RIR)
Code Code Available 15 D4AM: A General Denoising Framework for Downstream Acoustic Models Nov 28, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Cross Attention Augmented Transducer Networks for Simultaneous Translation Nov 1, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 A transfer learning based approach for pronunciation scoring Nov 1, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition May 16, 2023 Audio-Visual Speech Recognition Automatic Speech Recognition
Code Code Available 15 ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications Nov 8, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution Jul 4, 2022 Compiler Optimization image-classification
Code Code Available 15 Cross-Speaker Encoding Network for Multi-Talker Speech Recognition Jan 8, 2024 Decoder speech-recognition
Code Code Available 15 Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities May 23, 2025 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 A Comparison of Methods for OOV-word Recognition on a New Public Dataset Jul 16, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond Apr 20, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement Jun 22, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features Aug 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Convolutional Neural Network (CNN) to reduce construction loss in JPEG compression caused by Discrete Fourier Transform (DFT) Aug 26, 2022 Data Compression Image Compression
Code Code Available 15 Controlling Whisper: Universal Acoustic Adversarial Attacks to Control Speech Foundation Models Jul 5, 2024 Adversarial Attack Automatic Speech Recognition
Code Code Available 15 CopyNE: Better Contextual ASR by Copying Named Entities May 22, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Attention model for articulatory features detection Jul 2, 2019 Manner Of Articulation Detection model
Code Code Available 15 A Study of Multilingual End-to-End Speech Recognition for Kazakh, Russian, and English Aug 3, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Contrastive Learning-Based Audio to Lyrics Alignment for Multiple Languages Jun 13, 2023 Contrastive Learning speech-recognition
Code Code Available 15 CORAA: a large corpus of spontaneous and prepared speech manually validated for speech recognition in Brazilian Portuguese Oct 14, 2021 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15