SOTAVerified

Visual Speech Recognition

Papers

Showing 3140 of 182 papers

TitleStatusHype
Learn an Effective Lip Reading Model without PainsCode1
Do VSR Models Generalize Beyond LRS3?Code1
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech RecognitionCode1
End-to-end Audio-visual Speech Recognition with ConformersCode1
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
How to Teach DNNs to Pay Attention to the Visual Modality in Speech RecognitionCode1
CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command RecognitionCode1
Jointly Learning Visual and Auditory Speech Representations from Raw DataCode1
Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech RecognitionCode1
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech RecognitionCode1
Show:102550
← PrevPage 4 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified