Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

2021-04-18Code Available0· sign in to hype

Shaked Dovrat, Eliya Nachmani, Lior Wolf

Code Available — Be the first to reproduce this paper.

Code

github.com/shakeddovrat/librimix
OfficialIn papernone★ 6

Abstract

Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an O(C^3) time complexity, where C is the number of speakers, in comparison to O(C!) of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to 20 speakers and improves the previous results for large C by a wide margin.

Tasks

Speech Separation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Libri10Mix	Hungarian PIT	SI-SDRi	7.78	—	Unverified
Libri15Mix	Hungarian PIT	SI-SDRi	5.66	—	Unverified
Libri20Mix	Hungarian PIT	SI-SDRi	4.26	—	Unverified
Libri5Mix	Hungarian PIT	SI-SDRi	12.72	—	Unverified
WSJ0-5mix	Hungarian PIT	SI-SDRi	13.22	—	Unverified

Many-Speakers Single Channel Speech Separation with Optimal Permutation Training

Code

Abstract

Tasks

Benchmark Results

Reproductions