| Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing | Feb 23, 2024 | LipreadingLip Reading | CodeCode Available | 3 | 5 |
| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition | Feb 3, 2025 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 3 | 5 |
| SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization | Jun 18, 2024 | Landmark-based LipreadingLipreading | CodeCode Available | 2 | 5 |
| MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | Mar 1, 2023 | Audio-Visual Speech RecognitionRobust Speech Recognition | CodeCode Available | 2 | 5 |
| Visual Speech Recognition for Multiple Languages in the Wild | Feb 26, 2022 | Hyperparameter OptimizationLipreading | CodeCode Available | 2 | 5 |
| Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | Mar 25, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 | 5 |
| Large Language Models are Strong Audio-Visual Speech Recognition Learners | Sep 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| Robust Self-Supervised Audio-Visual Speech Recognition | Jan 5, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 | 5 |