Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

2021-02-03Code Available2· sign in to hype

Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

Code Available — Be the first to reproduce this paper.

Code

github.com/modelscope/ClearerVoice-Studio
OfficialIn paperpytorch★ 3,984
github.com/alibabasglab/frcrn
pytorch★ 168

Abstract

Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches (https://github.com/modelscope/ClearerVoice-Studio).

Tasks

Decoder Speech Denoising Speech Enhancement

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Deep Noise Suppression (DNS) Challenge	FRCRN	PESQ-WB	3.23	—	Unverified
DNS Challenge	DCCRN-M	PESQ-NB	3.15	—	Unverified
DNS Challenge	DCCRN-MC	PESQ-NB	3.21	—	Unverified
DNS Challenge	DCCRN	PESQ-NB	3.04	—	Unverified
VoiceBank + DEMAND	D2Former	PESQ (wb)	3.43	—	Unverified
WSJ0 + DEMAND + RNNoise	DCUNet-MC	PESQ-NB	3.44	—	Unverified
WSJ0 + DEMAND + RNNoise	DCCRN-M	PESQ-NB	3.28	—	Unverified
WSJ0 + DEMAND + RNNoise	DCUNet	PESQ-NB	3.25	—	Unverified

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Code

Abstract

Tasks

Benchmark Results

Reproductions