SOTAVerified

Visual Speech Recognition

Papers

Showing 2130 of 182 papers

TitleStatusHype
End-to-end Audio-visual Speech Recognition with ConformersCode1
Do VSR Models Generalize Beyond LRS3?Code1
Hearing Lips in Noise: Universal Viseme-Phoneme Mapping and Transfer for Robust Audio-Visual Speech RecognitionCode1
Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion EncoderCode1
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and RecognitionCode1
AV Taris: Online Audio-Visual Speech RecognitionCode1
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech RecognitionCode1
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech RecognitionCode1
CI-AVSR: A Cantonese Audio-Visual Speech Datasetfor In-car Command RecognitionCode1
Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery DetectionCode1
Show:102550
← PrevPage 3 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified