| Audio-Visual Speech Recognition is Worth 32328 Voxels | Sep 20, 2021 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Audio Visual Speech Recognition using Deep Recurrent Neural Networks | Nov 9, 2016 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture | Sep 28, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading | Jan 16, 2017 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition | Sep 29, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations | Feb 10, 2023 | Audio-Visual Speech RecognitionSelf-Supervised Learning | —Unverified | 0 |
| Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Mar 9, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| RUSAVIC Corpus: Russian Audio-Visual Speech in Cars | Jun 1, 2022 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Cocktail-Party Audio-Visual Speech Recognition | Jun 2, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices | Feb 17, 2023 | Audio-Visual Speech RecognitionGesture Recognition | —Unverified | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module | Aug 31, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Visual Speech Recognition | Sep 3, 2014 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Deep Multimodal Learning for Audio-Visual Speech Recognition | Jan 22, 2015 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Deep Multimodal Representation Learning from Temporal Data | Apr 11, 2017 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| Detecting Adversarial Attacks On Audiovisual Speech Recognition | Dec 18, 2019 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| LRS3-TED: a large-scale dataset for visual speech recognition | Sep 3, 2018 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| Recurrent Neural Network Transducer for Audio-Visual Speech Recognition | Nov 8, 2019 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 |