SOTAVerified

Visual Speech Recognition

Papers

Showing 1120 of 182 papers

TitleStatusHype
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech TokensCode1
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech RepresentationsCode1
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech RepresentationCode1
AlignVSR: Audio-Visual Cross-Modal Alignment for Visual Speech RecognitionCode1
Tailored Design of Audio-Visual Speech Recognition Models using BranchformersCode1
Learning Video Temporal Dynamics with Cross-Modal Attention for Robust Audio-Visual Speech RecognitionCode1
Watch Your Mouth: Silent Speech Recognition with Depth SensingCode1
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech RecognitionCode1
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech RepresentationCode1
Show:102550
← PrevPage 2 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified