WaveFlow: A Compact Flow-based Model for Raw Audio

2019-12-03ICML 2020Code Available1· sign in to hype

Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

Code Available — Be the first to reproduce this paper.

Code

github.com/PaddlePaddle/Parakeet
OfficialIn paperpaddle★ 0
github.com/L0SG/NanoFlow
pytorch★ 67
github.com/caillonantoine/waveflow
pytorch★ 4
github.com/L0SG/WaveFlow
pytorch★ 0

Abstract

In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15 smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio 42.6 faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.

Tasks

GPU Speech Synthesis

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
LibriTTS	WaveFlow	PESQ	3.03	—	Unverified

WaveFlow: A Compact Flow-based Model for Raw Audio

Code

Abstract

Tasks

Benchmark Results

Reproductions