SOTAVerified

Visual Speech Recognition

Papers

Showing 151182 of 182 papers

TitleStatusHype
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition0
SparseVSR: Lightweight and Noise Robust Visual Speech Recognition0
Spatio-Temporal Attention Mechanism and Knowledge Distillation for Lip Reading0
Speaker-Adapted End-to-End Visual Speech Recognition for Continuous Spanish0
Streaming Audio-Visual Speech Recognition with Alignment Regularization0
Sub-word Level Lip Reading With Visual Attention0
SUTAV: A Turkish Audio-Visual Database0
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer0
SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision0
Task-dependent modulation of the visual sensory thalamus assists visual-speech recognition0
The GUA-Speech System Description for CNVSRC Challenge 20230
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition0
The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge0
Towards Estimating the Upper Bound of Visual-Speech Recognition: The Visual Lip-Reading Feasibility Database0
Towards Lipreading Sentences with Active Appearance Models0
Recurrent Neural Network Transducer for Audio-Visual Speech RecognitionCode0
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic DataCode0
Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech RecognitionCode0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech RecognitionCode0
Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive SystemsCode0
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech RecognitionCode0
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildCode0
The NPU-ASLP System Description for Visual Speech Recognition in CNVSRC 2024Code0
Evaluation of End-to-End Continuous Spanish Lipreading in Different Data ConditionsCode0
Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign LanguageCode0
Deep word embeddings for visual speech recognitionCode0
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the WildCode0
Combining Residual Networks with LSTMs for LipreadingCode0
LRS3-TED: a large-scale dataset for visual speech recognitionCode0
Show:102550
← PrevPage 4 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified