Progressive Upsampling Audio Synthesis via Effective Adversarial Training

2019-09-25Unverified0· sign in to hype

Youngwoo Cho, Minwook Chang, Gerard Jounghyun Kim, Jaegul Choo

Unverified — Be the first to reproduce this paper.

Abstract

This paper proposes a novel generative model called PUGAN, which progressively synthesizes high-quality audio in a raw waveform. PUGAN leverages on the recently proposed idea of progressive generation of higher-resolution images by stacking multiple encode-decoder architectures. To effectively apply it to raw audio generation, we propose two novel modules: (1) a neural upsampling layer and (2) a sinc convolutional layer. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them in a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 20x smaller for 44.1kHz output, than an existing technique called WaveGAN. Our experiments show that the audio signals can be generated in real-time with the comparable quality to that of WaveGAN with respect to the inception scores and the human evaluation.

Tasks

Audio Generation Audio Synthesis Decoder

Progressive Upsampling Audio Synthesis via Effective Adversarial Training

Abstract

Tasks

Reproductions