Lipreading using Temporal Convolutional Networks

2020-01-23Code Available1· sign in to hype

Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

Code Available — Be the first to reproduce this paper.

Code

github.com/mpc001/Lipreading_using_Temporal_Convolutional_Networks
pytorch★ 433
github.com/Yondijr/FlowerPower
pytorch★ 2

Abstract

Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU) layers. In this work, we address the limitations of this model and we propose changes which further improve its performance. Firstly, the BGRU layers are replaced with Temporal Convolutional Networks (TCN). Secondly, we greatly simplify the training procedure, which allows us to train the model in one single stage. Thirdly, we show that the current state-of-the-art methodology produces models that do not generalize well to variations on the sequence length, and we addresses this issue by proposing a variable-length augmentation. We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively. Our proposed model results in an absolute improvement of 1.2% and 3.2%, respectively, in these datasets which is the new state-of-the-art performance.

Tasks

Lipreading Lip Reading

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Lip Reading in the Wild	3D Conv + ResNet-18 + MS-TCN	Top-1 Accuracy	85.3	—	Unverified
LRW-1000	3D Conv + ResNet-18 + MS-TCN	Top-1 Accuracy	41.4	—	Unverified

Lipreading using Temporal Convolutional Networks

Code

Abstract

Tasks

Benchmark Results

Reproductions