| VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis | Jul 8, 2025 | Automatic Speech RecognitionLip Reading | —Unverified | 0 |
| ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition | Jun 5, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Cocktail-Party Audio-Visual Speech Recognition | Jun 2, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing | May 27, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge | May 27, 2025 | Diversityspeech-recognition | —Unverified | 0 |
| Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign Language | May 20, 2025 | Multi-Task LearningSign Language Recognition | CodeCode Available | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition | May 20, 2025 | Audio-Visual Speech Recognitionspeaker-diarization | —Unverified | 0 |
| SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | May 7, 2025 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Visual-Aware Speech Recognition for Noisy Scenarios | Apr 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Mar 14, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | CodeCode Available | 1 |
| Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Mar 9, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations | Mar 8, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing | Feb 17, 2025 | Lip to Speech Synthesisspeech-recognition | —Unverified | 0 |
| MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Feb 11, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models | Feb 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Lightweight Operations for Visual Speech Recognition | Feb 7, 2025 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition | Feb 3, 2025 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 3 |
| Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Feb 3, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions | Feb 1, 2025 | Lipreadingspeech-recognition | CodeCode Available | 0 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition | Jan 8, 2025 | Lip Readingspeech-recognition | —Unverified | 0 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |