| AV Taris: Online Audio-Visual Speech Recognition | Dec 14, 2020 | Action DetectionActivity Detection | CodeCode Available | 1 | 5 |
| Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition | Feb 24, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition | Mar 6, 2020 | LipreadingLip Reading | CodeCode Available | 1 | 5 |
| The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023 | Jan 7, 2024 | Decoderspeech-recognition | CodeCode Available | 1 | 5 |
| CI-AVSR: A Cantonese Audio-Visual Speech Dataset for In-car Command Recognition | Jan 11, 2022 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| How to Teach DNNs to Pay Attention to the Visual Modality in Speech Recognition | Apr 17, 2020 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition | Jul 13, 2022 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 1 | 5 |
| Do VSR Models Generalize Beyond LRS3? | Nov 23, 2023 | Lip Readingspeech-recognition | CodeCode Available | 1 | 5 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 | 5 |
| End-to-end Audio-visual Speech Recognition with Conformers | Feb 12, 2021 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models | Feb 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition | Jun 18, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| Tailored Design of Audio-Visual Speech Recognition Models using Branchformers | Jul 9, 2024 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command Recognition | Jun 1, 2022 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 | 5 |
| Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder | Aug 14, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 | 5 |
| Learn an Effective Lip Reading Model without Pains | Nov 15, 2020 | LipreadingLip Reading | CodeCode Available | 1 | 5 |
| MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition | Jun 18, 2023 | Audio-Visual Speech RecognitionRepresentation Learning | CodeCode Available | 1 | 5 |
| Deep Audio-Visual Speech Recognition | Sep 6, 2018 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection | Dec 14, 2020 | DeepFake DetectionLipreading | CodeCode Available | 1 | 5 |
| Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper | Sep 15, 2023 | Language Identificationspeech-recognition | CodeCode Available | 1 | 5 |
| MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Mar 14, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | CodeCode Available | 1 | 5 |
| Recurrent Neural Network Transducer for Audio-Visual Speech Recognition | Nov 8, 2019 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 | 5 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 | 5 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 | 5 |