Automated Dubbing and Facial Synchronization using Deep Learning
Saad Ahmed Bazaz, AbdurRehman Subhani, Syed Zohair Abbas Hadi
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/grayhatdevelopers/deepdubpaddle★ 5
Abstract
With the recent global boom in video content creation and consumption during the pandemic, linguistics remains the only barrier in producing im-mersive content for global communities. To solve this, content creators use a manual dubbing process, where voice actors are hired to produce a “voiceover” over the video. We aim to break down the language barrier and thus make “videos for everyone”. We propose an end-to-end architecture that automatically translates videos and produces synchronized dubbed voices using deep learning models, in a specified target language. Our architecture takes a modular approach, allowing the user to tweak each component or replace it with a better one. We present our results from said architecture, and describe possible future motivations to scale this to accommodate multiple languages and multiple use cases. A sample of our results can be found here: https://youtu.be/eGB-gL6bDr4