| ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition | Jun 5, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Cocktail-Party Audio-Visual Speech Recognition | Jun 2, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition | May 20, 2025 | Audio-Visual Speech Recognitionspeaker-diarization | —Unverified | 0 |
| SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer | May 7, 2025 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative Synchronization | May 6, 2025 | Active Speaker DetectionAudio-Visual Speech Recognition | CodeCode Available | 2 |
| Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides | Apr 21, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Visual-Aware Speech Recognition for Noisy Scenarios | Apr 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens | Mar 14, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | CodeCode Available | 1 |
| Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | Mar 9, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations | Mar 8, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition | Feb 11, 2025 | Audio-Visual Speech RecognitionComputational Efficiency | —Unverified | 0 |
| Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation Models | Feb 9, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech Recognition | Feb 3, 2025 | Audio-Visual Speech RecognitionDecoder | CodeCode Available | 3 |
| Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models | Feb 3, 2025 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition | Jan 3, 2025 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Dec 22, 2024 | Audio-Visual Speech RecognitionInformativeness | —Unverified | 0 |
| Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Sep 29, 2024 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Large Language Models are Strong Audio-Visual Speech Recognition Learners | Sep 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module | Aug 31, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Tailored Design of Audio-Visual Speech Recognition Models using Branchformers | Jul 9, 2024 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition | Jul 4, 2024 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Jun 25, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Apr 19, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | Mar 21, 2024 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer | Mar 14, 2024 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 |
| It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition | Feb 8, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | Jan 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech Representation | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition | Jan 7, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| ES3: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations | Jan 1, 2024 | Audio-Visual Speech RecognitionLipreading | —Unverified | 0 |
| RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation | Sep 29, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition | Sep 29, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | Sep 15, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder | Aug 14, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech Recognition | Jun 18, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| MIR-GAN: Refining Frame-Level Modality-Invariant Representations with Adversarial Network for Audio-Visual Speech Recognition | Jun 18, 2023 | Audio-Visual Speech RecognitionRepresentation Learning | CodeCode Available | 1 |
| OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment | Jun 10, 2023 | Audio-Visual Speech RecognitionLip Reading | CodeCode Available | 1 |
| MAVD: The First Open Large-Scale Mandarin Audio-Visual Dataset with Depth Information | Jun 4, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization | May 18, 2023 | Audio-Visual Speech RecognitionPrompt Engineering | CodeCode Available | 1 |
| Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition | May 16, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| Auto-AVSR: Audio-Visual Speech Recognition with Automatic Labels | Mar 25, 2023 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring | Mar 15, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge | Mar 11, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation | Mar 1, 2023 | Audio-Visual Speech RecognitionRobust Speech Recognition | CodeCode Available | 2 |