MANNER: Multi-view Attention Network for Noise Erasure
Hyun Joon Park, Byung Ha Kang, WooSeok Shin, Jin Sob Kim, Sung Won Han
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/winddori2002/MANNEROfficialpytorch★ 65
Abstract
In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| VoiceBank + DEMAND | MANNER | PESQ (wb) | 3.21 | — | Unverified |