SOTAVerified

Visual Speech Recognition

Papers

Showing 101150 of 182 papers

TitleStatusHype
CNVSRC 2024: The Second Chinese Continuous Visual Speech Recognition Challenge0
MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification0
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition0
CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge0
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices0
Modality Attention for End-to-End Audio-visual Speech Recognition0
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization0
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides0
XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception0
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer0
Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis0
Multimodal Machine Learning: Integrating Language, Vision and Speech0
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations0
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition0
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition0
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing0
"Notic My Speech" -- Blending Speech Patterns With Multimedia0
Auxiliary Multimodal LSTM for Audio-visual Speech Recognition and Lipreading0
Automated Speaker Independent Visual Speech Recognition: A Comprehensive Survey0
Part-based Lipreading for Audio-Visual Speech Recognition0
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks0
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation0
Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph0
Audio-Visual Speech Recognition With A Hybrid CTC/Attention Architecture0
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition0
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective0
Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition0
Recent Progress in the CUHK Dysarthric Speech Recognition System0
Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition0
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models0
Resolution limits on visual speech recognition0
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement0
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration0
Audio Visual Speech Recognition using Deep Recurrent Neural Networks0
RUSAVIC Corpus: Russian Audio-Visual Speech in Cars0
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
Audio-Visual Speech Recognition is Worth 32328 Voxels0
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition0
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition0
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading0
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish0
Streaming Audio-Visual Speech Recognition with Alignment Regularization0
Sub-word Level Lip Reading With Visual Attention0
SUTAV: A Turkish Audio-Visual Database0
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer0
Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices0
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision0
Audio-visual Recognition of Overlapped speech for the LRS2 dataset0
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition0
Show:102550
← PrevPage 3 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified