Stabilizing Label Assignment for Speech Separation by Self-supervised Pre-training
Sung-Feng Huang, Shun-Po Chuang, Da-Rong Liu, Yi-Chen Chen, Gene-Ping Yang, Hung-Yi Lee
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/SungFeng-Huang/SSL-pretraining-separationOfficialIn paperpytorch★ 63
Abstract
Speech separation has been well developed, with the very successful permutation invariant training (PIT) approach, although the frequent label assignment switching happening during PIT training remains to be a problem when better convergence speed and achievable performance are desired. In this paper, we propose to perform self-supervised pre-training to stabilize the label assignment in training the speech separation model. Experiments over several types of self-supervised approaches, several typical speech separation models and two different datasets showed that very good improvements are achievable if a proper self-supervised approach is chosen.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| Libri2Mix | Conv-Tasnet (Libri1Mix speech enhancement pre-trained) | SI-SDRi | 14.1 | — | Unverified |
| Libri2Mix | Conv-Tasnet (Libri1Mix speech enhancement multi-task) | SI-SDRi | 13.7 | — | Unverified |
| Libri2Mix | Conv-Tasnet | SI-SDRi | 13.2 | — | Unverified |
| WSJ0-2mix | DPTNet (Libri1Mix speech enhancement pre-trained) | SI-SDRi | 21.3 | — | Unverified |