| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition | Mar 9, 2023 | Lip ReadingMachine Translation | CodeCode Available | 1 |
| Do VSR Models Generalize Beyond LRS3? | Nov 23, 2023 | Lip Readingspeech-recognition | CodeCode Available | 1 |
| Deep Audio-Visual Speech Recognition | Sep 6, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition | Apr 17, 2020 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition | Feb 24, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Audio-visual Recognition of Overlapped speech for the LRS2 dataset | Jan 6, 2020 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 |