| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | Jan 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| SparseVSR: Lightweight and Noise Robust Visual Speech Recognition | Jul 10, 2023 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading | Aug 7, 2021 | Audio-Visual Speech RecognitionKnowledge Distillation | —Unverified | 0 |
| Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish | Nov 21, 2023 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Streaming Audio-Visual Speech Recognition with Alignment Regularization | Nov 3, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Sub-word Level Lip Reading With Visual Attention | Oct 14, 2021 | Audio-Visual Active Speaker DetectionAutomatic Speech Recognition | —Unverified | 0 |
| SUTAV: A Turkish Audio-Visual Database | May 1, 2012 | Audio-Visual Speech RecognitionPerson Identification | —Unverified | 0 |
| SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | May 7, 2025 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision | Mar 30, 2023 | Lip Readingspeech-recognition | —Unverified | 0 |
| Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition | May 24, 2018 | Face Identificationspeech-recognition | —Unverified | 0 |
| The GUA-Speech System Description for CNVSRC Challenge 2023 | Dec 12, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | Sep 15, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition | May 20, 2025 | Audio-Visual Speech Recognitionspeaker-diarization | —Unverified | 0 |
| The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge | Mar 11, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database | Apr 26, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Towards Lipreading Sentences with Active Appearance Models | May 29, 2018 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 |
| Recurrent Neural Network Transducer for Audio-Visual Speech Recognition | Nov 8, 2019 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech Recognition | Jan 29, 2019 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| LIP-RTVE: An Audiovisual Database for Continuous Spanish in the Wild | Nov 21, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 0 |
| The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 | Aug 5, 2024 | Decoderspeech-recognition | CodeCode Available | 0 |
| Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions | Feb 1, 2025 | Lipreadingspeech-recognition | CodeCode Available | 0 |
| Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language | May 20, 2025 | Multi-Task LearningSign Language Recognition | CodeCode Available | 0 |
| Deep word embeddings for visual speech recognition | Oct 30, 2017 | Lipreadingspeech-recognition | CodeCode Available | 0 |
| LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the Wild | Oct 16, 2018 | LipreadingLip Reading | CodeCode Available | 0 |
| Combining Residual Networks with LSTMs for Lipreading | Mar 12, 2017 | LipreadingLip Reading | CodeCode Available | 0 |
| LRS3-TED: a large-scale dataset for visual speech recognition | Sep 3, 2018 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 0 |