UP-Cycle-SENet: Unpaired Phase-aware Speech Enhancement Using Deep Complex Cycle Adversarial Networks
Cheolhoon Park, Hyunduck Choi
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Speech enhancement (SE), which reconstructs intelligible speech by removing noise or interference from noisy speech, plays an important role in many speech applications. Due to the successful introduction of deep learning in SE, a significant performance improvement was recorded compared to the traditional methods. Most spectrogram-based deep SE networks have two main issues. First, many existing methods only focus on estimating the magnitude of the spectrogram while reusing the phase information. Reusing the phase part of a noisy spectrogram results in clear performance limitations, which can become more pronounced when the distortion caused by noise is severe. Second, most deep SE models adopt supervised learning, which requires a large number of paired datasets. Constructing a large dataset that includes clean speech is highly impractical due to the significant effort and cost involved. To address this issue, we propose UP-Cycle-SENet, an end-to-end complex SE network capable of estimating both the magnitude and phase parts of the speech spectrogram under unpaired dataset conditions. The proposed network leverages complex convolutional neural networks and extended modules to efficiently extract features in the complex domain without losing information. Additionally, the introduction of a CNN-based discriminator with non-autoregressive properties makes it suitable for fast training and inference. To effectively validate the benefits of the proposed network, comparative experiments were conducted using public datasets that mix Voice Bank and DEMAND. The experimental results demonstrated that the proposed framework outperforms previous methods in both parallel and non-parallel strategies.