SOTAVerified

Neural network based spectral mask estimation for acoustic beamforming

2016-03-20ICASSP 2016Code Available0· sign in to hype

Jahn Heymann, Lukas Drude, Reinhold Haeb-Umbach

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

We present a neural network based approach to acoustic beamforming. The network is used to estimate spectral masks from which the Cross-Power Spectral Density matrices of speech and noise are estimated, which in turn are used to compute the beamformer coefficients. The network training is independent of the number and the geometric configuration of the microphones. We further show that it is possible to train the network on clean speech only, avoiding the need for stereo data with separated speech and noise. Two types of networks are evaluated. One small feed-forward network with only one hidden layer and one more elaborated bi-directional Long Short-Term Memory network. We compare our system with different parametric approaches to mask estimation and using different beamforming algorithms. We show that our system yields superior results, both in terms of perceptual speech quality and with respect to speech recognition error rate. The results for the simple feed-forward network are especially encouraging considering its low computational requirements.

Tasks

Reproductions