SOTAVerified

Visual Speech Recognition

Papers

Showing 6170 of 182 papers

TitleStatusHype
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations0
Advances and Challenges in Deep Lip Reading0
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition0
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video0
ASR is all you need: cross-modal distillation for lip reading0
Deep Multimodal Representation Learning from Temporal Data0
Deep Multimodal Learning for Audio-Visual Speech Recognition0
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading0
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition0
Learn2Talk: 3D Talking Face Learns from 2D Talking Face0
Show:102550
← PrevPage 7 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified