SOTAVerified

Visual Speech Recognition

Papers

Showing 76100 of 182 papers

TitleStatusHype
Learn2Talk: 3D Talking Face Learns from 2D Talking Face0
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module0
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition0
Continuous Speech Recognition using EEG and Video0
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing0
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning0
Leveraging Uni-Modal Self-Supervised Learning for Multimodal Audio-visual Speech Recognition0
Conformers are All You Need for Visual Speech Recognition0
Lightweight Operations for Visual Speech Recognition0
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping0
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition0
Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion0
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models0
Lip Reading Sentences in the Wild0
Advances and Challenges in Deep Lip Reading0
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition0
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data0
Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?0
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs0
LRWR: Large-Scale Benchmark for Lip Reading in Russian language0
Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition0
Combining Multiple Views for Visual Speech Recognition0
Cocktail-Party Audio-Visual Speech Recognition0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified