Many-Speakers Single Channel Speech Separation with Optimal Permutation Training
Shaked Dovrat, Eliya Nachmani, Lior Wolf
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/shakeddovrat/librimixOfficialIn papernone★ 6
Abstract
Single channel speech separation has experienced great progress in the last few years. However, training neural speech separation for a large number of speakers (e.g., more than 10 speakers) is out of reach for the current methods, which rely on the Permutation Invariant Loss (PIT). In this work, we present a permutation invariant training that employs the Hungarian algorithm in order to train with an O(C^3) time complexity, where C is the number of speakers, in comparison to O(C!) of PIT based methods. Furthermore, we present a modified architecture that can handle the increased number of speakers. Our approach separates up to 20 speakers and improves the previous results for large C by a wide margin.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Libri10Mix | Hungarian PIT | SI-SDRi | 7.78 | — | Unverified |
| Libri15Mix | Hungarian PIT | SI-SDRi | 5.66 | — | Unverified |
| Libri20Mix | Hungarian PIT | SI-SDRi | 4.26 | — | Unverified |
| Libri5Mix | Hungarian PIT | SI-SDRi | 12.72 | — | Unverified |
| WSJ0-5mix | Hungarian PIT | SI-SDRi | 13.22 | — | Unverified |