E1 TTS: Simple and Fast Non-Autoregressive TTS

2024-09-14Unverified0· sign in to hype

Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li

Unverified — Be the first to reproduce this paper.

Abstract

This paper introduces Easy One-Step Text-to-Speech (E1 TTS), an efficient non-autoregressive zero-shot text-to-speech system based on denoising diffusion pretraining and distribution matching distillation. The training of E1 TTS is straightforward; it does not require explicit monotonic alignment between the text and audio pairs. The inference of E1 TTS is efficient, requiring only one neural network evaluation for each utterance. Despite its sampling efficiency, E1 TTS achieves naturalness and speaker similarity comparable to various strong baseline models. Audio samples are available at http://e1tts.github.io/ .

Tasks

Denoising text-to-speech Text to Speech

E1 TTS: Simple and Fast Non-Autoregressive TTS

Abstract

Tasks

Reproductions