SOTAVerified

Visual Speech Recognition

Papers

Showing 51100 of 182 papers

TitleStatusHype
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition0
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing0
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning0
Leveraging Uni-Modal Self-Supervised Learning for Multimodal Audio-visual Speech Recognition0
Lightweight Operations for Visual Speech Recognition0
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping0
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition0
Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion0
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models0
Lip Reading Sentences in the Wild0
Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
LRWR: Large-Scale Benchmark for Lip Reading in Russian language0
Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition0
MKPLS: Manifold Kernel Partial Least Squares for Lipreading and Speaker Identification0
MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition0
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices0
Modality Attention for End-to-End Audio-visual Speech Recognition0
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization0
Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer0
Multimodal Machine Learning: Integrating Language, Vision and Speech0
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition0
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing0
"Notic My Speech" -- Blending Speech Patterns With Multimedia0
Part-based Lipreading for Audio-Visual Speech Recognition0
Perception Point: Identifying Critical Learning Periods in Speech for Bilingual Networks0
Perfect match: Improved cross-modal embeddings for audio-visual synchronisation0
Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph0
Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition0
Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective0
Rate-Invariant Analysis of Trajectories on Riemannian Manifolds with Application in Visual Speech Recognition0
Recent Progress in the CUHK Dysarthric Speech Recognition System0
Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition0
Resolution limits on visual speech recognition0
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement0
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration0
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition0
3D Feature Pyramid Attention Module for Robust Visual Speech Recognition0
Adapter-Based Multi-Agent AVSR Extension for Pre-Trained ASR Models0
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs0
Advances and Challenges in Deep Lip Reading0
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model0
A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset0
Analysis of Visual Features for Continuous Lipreading in Spanish0
Another Point of View on Visual Speech Recognition0
ASR is all you need: cross-modal distillation for lip reading0
A three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms0
Audio-visual Recognition of Overlapped speech for the LRS2 dataset0
Audio-Visual Speech and Gesture Recognition by Sensors of Mobile Devices0
Show:102550
← PrevPage 2 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified