| Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video | Jan 25, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Dec 22, 2024 | Audio-Visual Speech RecognitionInformativeness | —Unverified | 0 | 0 |
| Modality Attention for End-to-End Audio-visual Speech Recognition | Nov 13, 2018 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 | 0 |
| MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Feb 11, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 | 0 |
| MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Jun 25, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 | 0 |
| Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer | Mar 14, 2024 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 | 0 |
| Multimodal Machine Learning: Integrating Language, Vision and Speech | Jul 1, 2017 | Audio-Visual Speech RecognitionBIG-bench Machine Learning | —Unverified | 0 | 0 |
| VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning | Nov 21, 2022 | Audio-Visual Speech RecognitionLanguage Modelling | —Unverified | 0 | 0 |
| A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset | Jan 21, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition | Jun 5, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 | 0 |
| Visual-Aware Speech Recognition for Noisy Scenarios | Apr 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Part-based Lipreading for Audio-Visual Speech Recognition | Dec 14, 2020 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 | 0 |
| Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Feb 3, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 | 0 |
| Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Sep 29, 2024 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 | 0 |
| Recent Progress in the CUHK Dysarthric Speech Recognition System | Jan 15, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition | Jul 4, 2014 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement | Dec 21, 2022 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 | 0 |
| ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration | Jan 1, 2023 | Audio-Visual Speech RecognitionResynthesis | —Unverified | 0 | 0 |
| Audio-Visual Speech Recognition is Worth 32328 Voxels | Sep 20, 2021 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Audio Visual Speech Recognition using Deep Recurrent Neural Networks | Nov 9, 2016 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture | Sep 28, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading | Jan 16, 2017 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition | Sep 29, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 | 0 |
| AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations | Feb 10, 2023 | Audio-Visual Speech RecognitionSelf-Supervised Learning | —Unverified | 0 | 0 |