SOTAVerified

Lip Reading

Lip Reading is a task to infer the speech content in a video by using only the visual information, especially the lip movements. It has many crucial applications in practice, such as assisting audio-based speech recognition, biometric authentication and aiding hearing-impaired people.

Source: Mutual Information Maximization for Effective Lip Reading

Papers

Showing 101125 of 153 papers

TitleStatusHype
Disentangling Homophemes in Lip Reading using Perplexity Analysis0
Learn an Effective Lip Reading Model without PainsCode1
Lip-reading with Densely Connected Temporal Convolutional NetworksCode1
A Study on Lip Localization Techniques used for Lip reading from a Video0
Seeing wake words: Audio-visual Keyword SpottingCode1
Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisCode1
Synchronous Bidirectional Learning for Multilingual Lip ReadingCode0
Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision0
Mutual Information Maximization for Effective Lip ReadingCode1
Deformation Flow Based Two-Stream Network for Lip ReadingCode1
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading0
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech RecognitionCode1
Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition0
Lipreading using Temporal Convolutional NetworksCode1
ASR is all you need: cross-modal distillation for lip reading0
Hearing Lips: Improving Lip Reading by Distilling Speech RecognizersCode0
Towards Pose-invariant Lip-Reading0
Spatio-Temporal Fusion Based Convolutional Sequence Learning for Lip Reading0
Multi-Grained Spatio-temporal Modeling for Lip-reading0
A Cascade Sequence-to-Sequence Model for Chinese Mandarin Lip Reading0
Realistic Speech-Driven Facial Animation with GANs0
MobiVSR: A Visual Speech Recognition Solution for Mobile Devices0
Synthesising 3D Facial Motion from "In-the-Wild" Speech0
Learning from Videos with Deep Convolutional LSTM Networks0
An Empirical Analysis of Deep Audio-Visual Models for Speech Recognition0
Show:102550
← PrevPage 5 of 7Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Lip2WavWER14.08Unverified
#ModelMetricClaimedVerifiedStatus
1Lip2WavWER34.2Unverified
#ModelMetricClaimedVerifiedStatus
1Lip2WavWER31.26Unverified