SepMamba: State-space models for speaker separation using Mamba

2024-10-28Code Available1· sign in to hype

Thor Højhus Avenstrup, Boldizsár Elek, István László Mádi, András Bence Schin, Morten Mørup, Bjørn Sand Jensen, Kenny Falkær Olsen

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/andrasschin/SepMamba
Officialpytorch★ 39

Abstract

Deep learning-based single-channel speaker separation has improved significantly in recent years largely due to the introduction of the transformer-based attention mechanism. However, these improvements come at the expense of intense computational demands, precluding their use in many practical applications. As a computationally efficient alternative with similar modeling capabilities, Mamba was recently introduced. We propose SepMamba, a U-Net-based architecture composed primarily of bidirectional Mamba layers. We find that our approach outperforms similarly-sized prominent models - including transformer-based models - on the WSJ0 2-speaker dataset while enjoying a significant reduction in computational cost, memory usage, and forward pass time. We additionally report strong results for causal variants of SepMamba. Our approach provides a computationally favorable alternative to transformer-based architectures for deep speech separation.

Tasks

Mamba Speaker Separation Speech Separation State Space Models

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WSJ0-2mix	SepMamba + DM (M)	SI-SDRi	22.7	—	Unverified
WSJ0-2mix	SepMamba + DM (S)	SI-SDRi	21.2	—	Unverified

SepMamba: State-space models for speaker separation using Mamba

Code

Abstract

Tasks

Benchmark Results

Reproductions