| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| Learn an Effective Lip Reading Model without Pains | Nov 15, 2020 | LipreadingLip Reading | CodeCode Available | 1 | 5 |
| MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition | Jun 18, 2023 | Audio-Visual Speech RecognitionRepresentation Learning | CodeCode Available | 1 | 5 |
| Deep Audio-Visual Speech Recognition | Sep 6, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection | Dec 14, 2020 | DeepFake DetectionLipreading | CodeCode Available | 1 | 5 |
| Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper | Sep 15, 2023 | Language Identificationspeech-recognition | CodeCode Available | 1 | 5 |
| MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Mar 14, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | CodeCode Available | 1 | 5 |
| Recurrent Neural Network Transducer for Audio-Visual Speech Recognition | Nov 8, 2019 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 | 5 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 | 5 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 | 5 |