MANNER: Multi-view Attention Network for Noise Erasure

2022-03-04Code Available1· sign in to hype

Hyun Joon Park, Byung Ha Kang, WooSeok Shin, Jin Sob Kim, Sung Won Han

Code Available — Be the first to reproduce this paper.

Code

github.com/winddori2002/MANNER
Officialpytorch★ 65

Abstract

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

Tasks

Decoder Speech Enhancement

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
VoiceBank + DEMAND	MANNER	PESQ (wb)	3.21	—	Unverified

MANNER: Multi-view Attention Network for Noise Erasure

Code

Abstract

Tasks

Benchmark Results

Reproductions