| SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | Jan 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading | Aug 7, 2021 | Audio-Visual Speech RecognitionKnowledge Distillation | —Unverified | 0 |
| ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations | Jan 1, 2024 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 |
| Fusing information streams in end-to-end audio-visual speech recognition | Apr 19, 2021 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Streaming Audio-Visual Speech Recognition with Alignment Regularization | Nov 3, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | May 7, 2025 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Towards Lipreading Sentences with Active Appearance Models | May 29, 2018 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 |
| Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video | Jan 25, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Dec 22, 2024 | Audio-Visual Speech RecognitionInformativeness | —Unverified | 0 |
| Modality Attention for End-to-End Audio-visual Speech Recognition | Nov 13, 2018 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 |
| MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Feb 11, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Jun 25, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer | Mar 14, 2024 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 |
| Multimodal Machine Learning: Integrating Language, Vision and Speech | Jul 1, 2017 | Audio-Visual Speech RecognitionBIG-bench Machine Learning | —Unverified | 0 |
| VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning | Nov 21, 2022 | Audio-Visual Speech RecognitionLanguage Modelling | —Unverified | 0 |
| A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset | Jan 21, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition | Jun 5, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Visual-Aware Speech Recognition for Noisy Scenarios | Apr 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Part-based Lipreading for Audio-Visual Speech Recognition | Dec 14, 2020 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 |
| Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Feb 3, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Sep 29, 2024 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Recent Progress in the CUHK Dysarthric Speech Recognition System | Jan 15, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition | Jul 4, 2014 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 |
| Audio-Visual Speech Recognition is Worth 32328 Voxels | Sep 20, 2021 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Audio Visual Speech Recognition using Deep Recurrent Neural Networks | Nov 9, 2016 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture | Sep 28, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading | Jan 16, 2017 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition | Sep 29, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations | Feb 10, 2023 | Audio-Visual Speech RecognitionSelf-Supervised Learning | —Unverified | 0 |
| Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Mar 9, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| RUSAVIC Corpus: Russian Audio-Visual Speech in Cars | Jun 1, 2022 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Cocktail-Party Audio-Visual Speech Recognition | Jun 2, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices | Feb 17, 2023 | Audio-Visual Speech RecognitionGesture Recognition | —Unverified | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module | Aug 31, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Visual Speech Recognition | Sep 3, 2014 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Deep Multimodal Learning for Audio-Visual Speech Recognition | Jan 22, 2015 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Deep Multimodal Representation Learning from Temporal Data | Apr 11, 2017 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| LRS3-TED: a large-scale dataset for visual speech recognition | Sep 3, 2018 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| Recurrent Neural Network Transducer for Audio-Visual Speech Recognition | Nov 8, 2019 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |