SOTAVerified

Visual Speech Recognition

Papers

Showing 110 of 182 papers

TitleStatusHype
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionCode3
Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and TranslationCode3
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech ProcessingCode3
CoGenAV: Versatile Audio-Visual Representation Learning via Contrastive-Generative SynchronizationCode2
Large Language Models are Strong Audio-Visual Speech Recognition LearnersCode2
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token SynchronizationCode2
Auto-AVSR: Audio-Visual Speech Recognition with Automatic LabelsCode2
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text TranslationCode2
Visual Speech Recognition for Multiple Languages in the WildCode2
Robust Self-Supervised Audio-Visual Speech RecognitionCode2
Show:102550
← PrevPage 1 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified