SOTAVerified

Lip Reading

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Papers

Showing 150 of 153 papers

TitleStatusHype
GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face SynthesisCode4
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech ProcessingCode3
Seeing What You Said: Talking Face Generation Guided by a Lip Reading ExpertCode2
Training Strategies for Improved Lip-readingCode2
Learning Audio-Visual Speech Representation by Masked Multimodal Cluster PredictionCode2
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and LanguageCode1
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech RecognitionCode1
Deep Audio-Visual Speech RecognitionCode1
Do VSR Models Generalize Beyond LRS3?Code1
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech RecognitionCode1
Visual Keyword Spotting with AttentionCode1
Selective Listening by Synchronizing Speech with LipsCode1
Contrastive Learning of Global-Local Video RepresentationsCode1
Learn an Effective Lip Reading Model without PainsCode1
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip ReadingCode1
Audio-Visual Efficient Conformer for Robust Speech RecognitionCode1
OLKAVS: An Open Large-Scale Korean Audio-Visual Speech DatasetCode1
End-to-End Speech-Driven Facial Animation with Temporal GANsCode1
Mutual Information Maximization for Effective Lip ReadingCode1
End-to-end Audio-visual Speech Recognition with ConformersCode1
MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and RecognitionCode1
LipVoicer: Generating Speech from Silent Videos Guided by Lip ReadingCode1
OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality AlignmentCode1
SelfTalk: A Self-Supervised Commutative Training Diagram to Comprehend 3D Talking FacesCode1
Seeing wake words: Audio-visual Keyword SpottingCode1
Lipreading using Temporal Convolutional NetworksCode1
Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisCode1
Deformation Flow Based Two-Stream Network for Lip ReadingCode1
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realismCode1
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face VideoCode1
Lip-reading with Densely Connected Temporal Convolutional NetworksCode1
Talking Face Generation by Adversarially Disentangled Audio-Visual RepresentationCode0
Audio-Visual Speech Recognition based on Regulated Transformer and Spatio-Temporal Fusion Strategy for Driver Assistive SystemsCode0
Speaker-adaptive Lip Reading with User-dependent PaddingCode0
Synchronous Bidirectional Learning for Multilingual Lip ReadingCode0
Transforming faces into video stories -- VideoFace2.0Code0
Combining Residual Networks with LSTMs for LipreadingCode0
Relaxed Attention for Transformer ModelsCode0
MTGA: Multi-View Temporal Granularity Aligned Aggregation for Event-Based Lip-ReadingCode0
Multi-Perspective LSTM for Joint Visual Representation LearningCode0
DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial AnimationCode0
Lip Sync Matters: A Novel Multimodal Forgery DetectorCode0
LRW-1000: A Naturally-Distributed Large-Scale Benchmark for Lip Reading in the WildCode0
AuthNet: A Deep Learning based Authentication Mechanism using Temporal Facial Feature MovementsCode0
Learning Separable Hidden Unit Contributions for Speaker-Adaptive Lip-ReadingCode0
A Novel Interpretable and Generalizable Re-synchronization Model for Cued Speech based on a Multi-Cuer CorpusCode0
Lend a Hand: Semi Training-Free Cued Speech Recognition via MLLM-Driven Hand Modeling for Barrier-free CommunicationCode0
Lip2AudSpec: Speech reconstruction from silent lip movements videoCode0
Hearing Lips: Improving Lip Reading by Distilling Speech RecognizersCode0
Estimating speech from lip dynamicsCode0
Show:102550
← PrevPage 1 of 4Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Lip2WavWER14.08Unverified
#ModelMetricClaimedVerifiedStatus
1Lip2WavWER34.2Unverified
#ModelMetricClaimedVerifiedStatus
1Lip2WavWER31.26Unverified