| Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing | May 27, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge | May 27, 2025 | Diversityspeech-recognition | —Unverified | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition | May 20, 2025 | Audio-Visual Speech Recognitionspeaker-diarization | —Unverified | 0 |
| Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language | May 20, 2025 | Multi-Task LearningSign Language Recognition | CodeCode Available | 0 |
| SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | May 7, 2025 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Visual-Aware Speech Recognition for Noisy Scenarios | Apr 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Mar 9, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing | Feb 17, 2025 | Lip to Speech Synthesisspeech-recognition | —Unverified | 0 |
| MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Feb 11, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Lightweight Operations for Visual Speech Recognition | Feb 7, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Feb 3, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions | Feb 1, 2025 | Lipreadingspeech-recognition | CodeCode Available | 0 |
| LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Jan 8, 2025 | Lip Readingspeech-recognition | —Unverified | 0 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Dec 22, 2024 | Audio-Visual Speech RecognitionInformativeness | —Unverified | 0 |
| Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Sep 29, 2024 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Enhancing CTC-Based Visual Speech Recognition | Sep 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module | Aug 31, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 | Aug 5, 2024 | Decoderspeech-recognition | CodeCode Available | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Jun 25, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge | Jun 14, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |