High Fidelity Speech Enhancement with Band-split RNN

2022-12-01Code Available1· sign in to hype

Jianwei Yu, Yi Luo, Hangting Chen, Rongzhi Gu, Chao Weng

Code Available — Be the first to reproduce this paper.

Code

github.com/sungwon23/bsrnn
pytorch★ 129

Abstract

Despite the rapid progress in speech enhancement (SE) research, enhancing the quality of desired speech in environments with strong noise and interfering speakers remains challenging. In this paper, we extend the application of the recently proposed band-split RNN (BSRNN) model to full-band SE and personalized SE (PSE) tasks. To mitigate the effects of unstable high-frequency components in full-band speech, we perform bi-directional and uni-directional band-level modeling to low-frequency and high-frequency subbands, respectively. For PSE task, we incorporate a speaker enrollment module into BSRNN to utilize target speaker information. Moreover, we utilize a MetricGAN discriminator (MGD) and a multi-resolution spectrogram discriminator (MRSD) to improve perceptual quality metrics. Experimental results show that our system outperforms various top-ranking SE systems, achieves state-of-the-art (SOTA) results on the DNS-2020 test set and ranks among the top 3 in the DNS-2023 challenge.

Tasks

Speech Enhancement Vocal Bursts Intensity Prediction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Deep Noise Suppression (DNS) Challenge	BSRNN-S + MRSD	PESQ-WB	3.53	—	Unverified
Deep Noise Suppression (DNS) Challenge	BSRNN-16k	PESQ-WB	3.45	—	Unverified
Deep Noise Suppression (DNS) Challenge	BSRNN-S	PESQ-WB	3.42	—	Unverified
Deep Noise Suppression (DNS) Challenge	BSRNN	PESQ-WB	3.32	—	Unverified
Deep Noise Suppression (DNS) Challenge	BSRNN-S + MGD	SI-SDR-WB	21.4	—	Unverified

High Fidelity Speech Enhancement with Band-split RNN

Code

Abstract

Tasks

Benchmark Results

Reproductions