SOTAVerified

Visual Speech Recognition

Papers

Showing 101125 of 182 papers

TitleStatusHype
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge0
MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification0
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition0
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge0
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices0
Modality Attention for End-to-End Audio-visual Speech Recognition0
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization0
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides0
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception0
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer0
Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis0
Multimodal Machine Learning: Integrating Language, Vision and Speech0
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations0
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition0
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition0
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing0
"Notic My Speech" -- Blending Speech Patterns With Multimedia0
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading0
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey0
Part-based Lipreading for Audio-Visual Speech Recognition0
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks0
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation0
Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph0
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture0
Show:102550
← PrevPage 5 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified