SOTAVerified

Visual Speech Recognition

Papers

Showing 1120 of 182 papers

TitleStatusHype
Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides0
Visual-Aware Speech Recognition for Noisy Scenarios0
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech TokensCode1
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs0
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech RepresentationsCode1
NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing0
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Lightweight Operations for Visual Speech Recognition0
mWhisper-Flamingo for Multilingual Audio-Visual Noise-Robust Speech RecognitionCode3
Show:102550
← PrevPage 2 of 19Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified