SOTAVerified

Visual Speech Recognition

Papers

Showing 51100 of 182 papers

TitleStatusHype
Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive SystemsCode0
Combining Residual Networks with LSTMs for LipreadingCode0
Deep word embeddings for visual speech recognitionCode0
Evaluation of End-to-End Continuous Spanish Lipreading in Different Data ConditionsCode0
Harnessing GANs for Zero-shot Learning of New Classes in Visual Speech RecognitionCode0
LIP-RTVE: An Audiovisual Database for Continuous Spanish in the WildCode0
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic DataCode0
LRS3-TED: a large-scale dataset for visual speech recognitionCode0
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the WildCode0
Multichannel AV-wav2vec2: A Framework for Learning Multichannel Multi-Modal Speech RepresentationCode0
Recurrent Neural Network Transducer for Audio-Visual Speech RecognitionCode0
Transfer Learning from Visual Speech Recognition to Mouthing Recognition in German Sign LanguageCode0
Deep Multimodal Representation Learning from Temporal Data0
Deep Multimodal Learning for Audio-Visual Speech Recognition0
Improving the Gap in Visual Speech Recognition Between Normal and Silent Speech Based on Metric Learning0
Interactive decoding of words from visual speech recognition models0
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition0
Is Lip Region-of-Interest Sufficient for Lipreading?0
Deep Lip Reading: a comparison of models and an online application0
JEP-KD: Joint-Embedding Predictive Architecture Based Knowledge Distillation for Visual Speech Recognition0
Deep Learning for Visual Speech Analysis: A Survey0
Kaggle Competition: Cantonese Audio-Visual Speech Recognition for In-car Commands0
Deep Learning-based Spatio Temporal Facial Feature Visual Speech Recognition0
Large-Scale Visual Speech Recognition0
Large-vocabulary Audio-visual Speech Recognition in Noisy Environments0
Learn2Talk: 3D Talking Face Learns from 2D Talking Face0
DCIM-AVSR : Efficient Audio-Visual Speech Recognition via Dual Conformer Interaction Module0
Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition0
Continuous Speech Recognition using EEG and Video0
Leveraging Large Language Models in Visual Speech Recognition: Model Scaling, Context-Aware Decoding, and Iterative Polishing0
Leveraging Modality-specific Representations for Audio-visual Speech Recognition via Reinforcement Learning0
Leveraging Uni-Modal Self-Supervised Learning for Multimodal Audio-visual Speech Recognition0
Conformers are All You Need for Visual Speech Recognition0
Lightweight Operations for Visual Speech Recognition0
Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping0
LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition0
Lip Graph Assisted Audio-Visual Speech Recognition Using Bidirectional Synchronous Fusion0
Lip-Listening: Mixing Senses to Understand Lips using Cross Modality Knowledge Distillation for Word-Based Models0
Lip Reading Sentences in the Wild0
AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model0
Comparison of Conventional Hybrid and CTC/Attention Decoders for Continuous Visual Speech Recognition0
Advances and Challenges in Deep Lip Reading0
Listening With Your Eyes: Towards a Practical Visual Speech Recognition System Using Deep Boltzmann Machines0
LiteVSR: Efficient Visual Speech Recognition by Learning from Speech Representations of Unlabeled Data0
Which phoneme-to-viseme maps best improve visual-only computer lip-reading?0
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs0
LRWR: Large-Scale Benchmark for Lip Reading in Russian language0
Manifold-Kernels Comparison in MKPLS for Visual Speech Recognition0
Combining Multiple Views for Visual Speech Recognition0
Cocktail-Party Audio-Visual Speech Recognition0
Show:102550
← PrevPage 2 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)30.7Unverified
2CTC/AttentionWord Error Rate (WER)19.1Unverified
#ModelMetricClaimedVerifiedStatus
1VTP with more dataWord Error Rate (WER)22.6Unverified