SOTAVerified

Lipreading

Lipreading is a process of extracting speech by watching lip movements of a speaker in the absence of sound. Humans lipread all the time without even noticing. It is a big part in communication albeit not as dominant as audio. It is a very helpful skill to learn especially for those who are hard of hearing.

Deep Lipreading is the process of extracting speech from a video of a silent talking face using deep neural networks. It is also known by few other names: Visual Speech Recognition (VSR), Machine Lipreading, Automatic Lipreading etc.

The primary methodology involves two stages: i) Extracting visual and temporal features from a sequence of image frames from a silent talking video ii) Processing the sequence of features into units of speech e.g. characters, words, phrases etc. We can find several implementations of this methodology either done in two separate stages or trained end-to-end in one go.

Papers

Showing 110 of 103 papers

TitleStatusHype
Learning Speaker-Invariant Visual Features for Lipreading0
UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation0
OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours0
Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation0
Audio-Visual Representation Learning via Knowledge Distillation from Speech Foundation ModelsCode1
Evaluation of End-to-End Continuous Spanish Lipreading in Different Data ConditionsCode0
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual InputsCode1
RAL:Redundancy-Aware Lipreading Model Based on Differential Learning with Symmetric Views0
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token SynchronizationCode2
Watch Your Mouth: Silent Speech Recognition with Depth SensingCode1
Show:102550
← PrevPage 1 of 11Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Conv-seq2seqWord Error Rate (WER)60.1Unverified
2CTC + KDWord Error Rate (WER)59.8Unverified
3TM-seq2seqWord Error Rate (WER)58.9Unverified
4EG-seq2seqWord Error Rate (WER)57.8Unverified
5CTC-V2PWord Error Rate (WER)55.1Unverified
6Hyb + ConformerWord Error Rate (WER)43.3Unverified
7VTPWord Error Rate (WER)40.6Unverified
8ES³ BaseWord Error Rate (WER)40.3Unverified
9ES³ LargeWord Error Rate (WER)37.1Unverified
10RNN-TWord Error Rate (WER)33.6Unverified