Wavesplit: End-to-End Speech Separation by Speaker Clustering

2020-02-20Unverified0· sign in to hype

Neil Zeghidour, David Grangier

Unverified — Be the first to reproduce this paper.

Abstract

We introduce Wavesplit, an end-to-end source separation system. From a single mixture, the model infers a representation for each source and then estimates each source signal given the inferred representations. The model is trained to jointly perform both tasks from the raw waveform. Wavesplit infers a set of source representations via clustering, which addresses the fundamental permutation problem of separation. For speech separation, our sequence-wide speaker representations provide a more robust separation of long, challenging recordings compared to prior work. Wavesplit redefines the state-of-the-art on clean mixtures of 2 or 3 speakers (WSJ0-2/3mix), as well as in noisy and reverberated settings (WHAM/WHAMR). We also set a new benchmark on the recent LibriMix dataset. Finally, we show that Wavesplit is also applicable to other domains, by separating fetal and maternal heart rates from a single abdominal electrocardiogram.

Tasks

Clustering Data Augmentation Speech Separation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WHAMR!	Wavesplit	SI-SDRi	13.2	—	Unverified
WSJ0-2mix	Wavesplit v2	SI-SDRi	22.2	—	Unverified
WSJ0-2mix	Wavesplit v1	SI-SDRi	19	—	Unverified

Wavesplit: End-to-End Speech Separation by Speaker Clustering

Abstract

Tasks

Benchmark Results

Reproductions