RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

2021-06-02ICML Workshop INNF 2021Code Available1· sign in to hype

Kevin J. Shih, Rafael Valle, Rohan Badlani, Adrian Lancucki, Wei Ping, Bryan Catanzaro

Code Available — Be the first to reproduce this paper.

Code

github.com/NVIDIA/radtts
Officialpytorch★ 291

Abstract

This work introduces a predominantly parallel, end-to-end TTS model based on normalizing flows. It extends prior parallel approaches by additionally modeling speech rhythm as a separate generative distribution to facilitate variable token duration during inference. We further propose a robust framework for the on-line extraction of speech-text alignments -- a critical yet highly unstable learning problem in end-to-end TTS frameworks. Our experiments demonstrate that our proposed techniques yield improved alignment quality, better output diversity compared to controlled baselines.

Tasks

Diversity Rhythm Speech Synthesis Text-To-Speech Synthesis

RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis

Code

Abstract

Tasks

Reproductions