Robust universal neural vocoding

2018-11-15Code Available1· sign in to hype

Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, Thomas Merritt, Bartosz Putrycz, Roberto Barra-Chicote

Code Available — Be the first to reproduce this paper.

Code

github.com/bshall/ZeroSpeech
pytorch★ 339
github.com/yistLin/FragmentVC
pytorch★ 203
github.com/howard1337/S2VC
pytorch★ 100
github.com/avanitanna/robustfragmentvc
pytorch★ 3
github.com/bshall/UniversalVocoding
pytorch★ 0
github.com/anandaswarup/waveRNN
pytorch★ 0
github.com/m-toman/tacorn
pytorch★ 0
github.com/yistlin/universal-vocoder
pytorch★ 0

Abstract

This paper introduces a robust universal neural vocoder trained with 74 speakers (comprised of both genders) coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker, style or recording condition seen during training or from an out-of-domain scenario. Together with the system, we present a full text-to-speech analysis of robustness of a number of implemented systems. The complexity of systems tested range from a convolutional neural networks-based system conditioned on linguistics to a recurrent neural networks-based system conditioned on mel-spectrograms. The analysis shows that convolutional neural networks-based systems are prone to occasional instabilities, while the recurrent approaches are significantly more stable and capable of providing universalizing robustness.

Tasks

text-to-speech Text to Speech

Robust universal neural vocoding

Code

Abstract

Tasks

Reproductions