| Uncovering the Visual Contribution in Audio-Visual Speech Recognition | Dec 22, 2024 | Audio-Visual Speech RecognitionInformativeness | —Unverified | 0 |
| AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech Recognition | Oct 21, 2024 | cross-modal alignmentspeech-recognition | CodeCode Available | 1 |
| Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective | Sep 29, 2024 | Audio-Visual Speech RecognitionLip Reading | —Unverified | 0 |
| Large Language Models are Strong Audio-Visual Speech Recognition Learners | Sep 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Enhancing CTC-Based Visual Speech Recognition | Sep 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module | Aug 31, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024 | Aug 5, 2024 | Decoderspeech-recognition | CodeCode Available | 0 |
| SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data | Aug 1, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 0 |
| Tailored Design of Audio-Visual Speech Recognition Models using Branchformers | Jul 9, 2024 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech Recognition | Jul 4, 2024 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization | Jun 25, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization | Jun 18, 2024 | Landmark-based LipreadingLipreading | CodeCode Available | 2 |
| CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge | Jun 14, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation | Jun 14, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| Watch Your Mouth: Silent Speech Recognition with Depth Sensing | May 11, 2024 | Deep LearningLipreading | CodeCode Available | 1 |
| Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive Systems | May 9, 2024 | Audio-Visual Speech RecognitionLipreading | CodeCode Available | 0 |
| Learn2Talk: 3D Talking Face Learns from 2D Talking Face | Apr 19, 2024 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | Mar 21, 2024 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer | Mar 14, 2024 | Audio-Visual Speech RecognitionRobust Speech Recognition | —Unverified | 0 |
| A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition | Mar 7, 2024 | Audio-Visual Speech RecognitionKnowledge Distillation | CodeCode Available | 0 |
| JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition | Mar 4, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing | Feb 23, 2024 | LipreadingLip Reading | CodeCode Available | 3 |
| Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition | Feb 20, 2024 | Decoderspeech-recognition | —Unverified | 0 |
| It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition | Feb 8, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 1 |
| SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition | Jan 18, 2024 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |