Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

2025-02-01Code Available0· sign in to hype

David Gimeno-Gómez, Carlos-D. Martínez-Hinarejos

Code Available — Be the first to reproduce this paper.

Code

github.com/david-gimeno/evaluating-end2end-spanish-lipreading
OfficialIn paperpytorch★ 0

Abstract

Visual speech recognition remains an open research problem where different challenges must be considered by dispensing with the auditory sense, such as visual ambiguities, the inter-personal variability among speakers, and the complex modeling of silence. Nonetheless, recent remarkable results have been achieved in the field thanks to the availability of large-scale databases and the use of powerful attention mechanisms. Besides, multiple languages apart from English are nowadays a focus of interest. This paper presents noticeable advances in automatic continuous lipreading for Spanish. First, an end-to-end system based on the hybrid CTC/Attention architecture is presented. Experiments are conducted on two corpora of disparate nature, reaching state-of-the-art results that significantly improve the best performance obtained to date for both databases. In addition, a thorough ablation study is carried out, where it is studied how the different components that form the architecture influence the quality of speech recognition. Then, a rigorous error analysis is carried out to investigate the different factors that could affect the learning of the automatic system. Finally, a new Spanish lipreading benchmark is consolidated. Code and trained models are available at https://github.com/david-gimeno/evaluating-end2end-spanish-lipreading.

Tasks

Lipreading speech-recognition Speech Recognition Visual Speech Recognition

Evaluation of End-to-End Continuous Spanish Lipreading in Different Data Conditions

Code

Abstract

Tasks

Reproductions