A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement

2021-02-15Code Available1· sign in to hype

Code Available — Be the first to reproduce this paper.

Code

github.com/tvuong123/ModulationDomainLoss
Officialpytorch★ 44

Abstract

We describe a modulation-domain loss function for deep-learning-based speech enhancement systems. Learnable spectro-temporal receptive fields (STRFs) were adapted to optimize for a speaker identification task. The learned STRFs were then used to calculate a weighted mean-squared error (MSE) in the modulation domain for training a speech enhancement system. Experiments showed that adding the modulation-domain MSE to the MSE in the spectro-temporal domain substantially improved the objective prediction of speech quality and intelligibility for real-time speech enhancement systems without incurring additional computation during inference.

Tasks

Speaker Identification Speech Denoising Speech Enhancement

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Deep Noise Suppression (DNS) Challenge	RNN-Modulation	PESQ-WB	2.75	—	Unverified
DNS Challenge	RNN-Modulation	PESQ-WB	2.75	—	Unverified
VoiceBank + DEMAND	real-time-GRU	PESQ (wb)	2.82	—	Unverified

A Modulation-Domain Loss for Neural-Network-based Real-time Speech Enhancement

Code

Abstract

Tasks

Benchmark Results

Reproductions