SOTAVerified

Visual Speech Recognition

Papers

Showing 151182 of 182 papers

TitleStatusHype
The GUA-Speech System Description for CNVSRC Challenge 20230
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition0
A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms0
The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge0
Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database0
Towards Lipreading Sentences with Active Appearance Models0
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition0
Transformer-Based Video Front-Ends for Audio-Visual Speech Recognition for Single and Multi-Person Video0
Uncovering the Visual Contribution in Audio-Visual Speech Recognition0
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning0
ViCocktail: Automated Multi-Modal Data Collection for Vietnamese Audio-Visual Speech Recognition0
Video-Based Action Recognition Using Rate-Invariant Analysis of Covariance Trajectories0
Visual-Aware Speech Recognition for Noisy Scenarios0
ASR is all you need: cross-modal distillation for lip reading0
Visual-Only Recognition of Normal, Whispered and Silent Speech0
VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis0
Visual Speech Recognition0
Visual speech recognition: aligning terminologies for better understanding0
Another Point of View on Visual Speech Recognition0
Analysis of Visual Features for Continuous Lipreading in Spanish0
Visual Speech Recognition in a Driver Assistance System0
Visual Speech Recognition Using PCA Networks and LSTMs in a Tandem GMM-HMM System0
Detecting Adversarial Attacks On Audiovisual Speech Recognition0
End-to-End Lip Reading in Romanian with Cross-Lingual Domain Adaptation and Lateral Inhibition0
End-to-End Visual Speech Recognition for Small-Scale Datasets0
End-To-End Visual Speech Recognition With LSTMs0
Enhancing CTC-Based Visual Speech Recognition0
Visual Words for Automatic Lip-Reading0
Fusing information streams in end-to-end audio-visual speech recognition0
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset0
Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video0
Show:102550
← PrevPage 4 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified