SOTAVerified

Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Showing 51100 of 359 papers

TitleStatusHype
SepMamba: State-space models for speaker separation using MambaCode1
Unifying Speech Enhancement and Separation with Gradient Modulation for End-to-End Noise-Robust Speech SeparationCode1
Noise-robust Speech Separation with Fast Generative CorrectionCode1
SepPrune: Structured Pruning for Efficient Deep Speech SeparationCode1
VisualVoice: Audio-Visual Speech Separation with Cross-Modal ConsistencyCode1
GEV Beamforming Supported by DOA-based Masks Generated on Pairs of MicrophonesCode1
Attention is All You Need in Speech SeparationCode1
Enhanced Reverberation as Supervision for Unsupervised Speech SeparationCode1
MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D ScenesCode1
Compute and memory efficient universal sound source separationCode1
TDFNet: An Efficient Audio-Visual Speech Separation Model with Top-down FusionCode1
Continuous speech separation: dataset and analysisCode1
Effective Low-Cost Time-Domain Audio Separation Using Globally Attentive Locally Recurrent NetworksCode1
Continuous Speech Separation with ConformerCode1
Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakersCode1
A cappella: Audio-visual Singing Voice SeparationCode1
An Audio-Visual Speech Separation Model Inspired by Cortico-Thalamo-Cortical CircuitsCode1
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech SeparationCode1
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separationCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
Text-aware Speech Separation for Multi-talker Keyword SpottingCode1
Online speaker diarization of meetings guided by speech separationCode1
The Cone of Silence: Speech Separation by LocalizationCode1
Improving speaker discrimination of target speech extraction with time-domain SpeakerBeamCode1
Speech Separation with Pretrained Frontend to Minimize Domain MismatchCode0
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary ModelCode0
Analysis of impact of emotions on target speech extraction and speech separationCode0
SPGM: Prioritizing Local Features for enhanced speech separation performanceCode0
Speaker Extraction with Co-Speech Gestures CueCode0
CasNet: Investigating Channel Robustness for Speech SeparationCode0
Singing Voice Separation with Deep U-Net Convolutional NetworksCode0
ADL-MVDR: All deep learning MVDR beamformer for target speech separationCode0
A Multi-Phase Gammatone Filterbank for Speech Separation via TasNetCode0
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker SeparationCode0
Beyond Speaker Identity: Text Guided Target Speech ExtractionCode0
Semi-Supervised Monaural Singing Voice Separation With a Masking Network Trained on Synthetic MixturesCode0
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign languageCode0
Disentangling the Impacts of Language and Channel Variability on Speech Separation NetworksCode0
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech SeparationCode0
REAL-M: Towards Speech Separation on Real MixturesCode0
Onssen: an open-source speech separation and enhancement libraryCode0
Exploring Self-Attention Mechanisms for Speech SeparationCode0
Real-time Single-channel Dereverberation and Separation with Time-domainAudio Separation NetworkCode0
Multi-talker Speech Separation with Utterance-level Permutation Invariant Training of Deep Recurrent Neural NetworksCode0
Deep Recurrent NMF for Speech Separation by Unfolding Iterative ThresholdingCode0
An enhanced Conv-TasNet model for speech separation using a speaker distance-based loss functionCode0
Alternative Objective Functions for Deep ClusteringCode0
Deep learning for monaural speech separationCode0
Multi-Decoder DPRNN: High Accuracy Source Counting and SeparationCode0
Resource-Efficient Separation TransformerCode0
Show:102550
← PrevPage 2 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SepReformer-LSI-SDRi25.1Unverified
2TF-Locoformer (L) + DMSI-SDRi25.1Unverified
3TF-Locoformer (M) + DMSI-SDRi24.6Unverified
4TF-Locoformer (L)SI-SDRi24.2Unverified
5MossFormer2 (L)SI-SDRi24.1Unverified
6SepTDA (L=12)SI-SDRi24Unverified
7Separate And DiffuseSI-SDRi23.9Unverified
8TF-Locoformer (M)SI-SDRi23.6Unverified
9TF-Locoformer (S) + DMSI-SDRi22.8Unverified
10MossFormer (L) + DMSI-SDRi22.8Unverified
#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (M)SI-SDRi18.5Unverified
2TF-Locoformer (S)SI-SDRi17.4Unverified
3SepReformer-L + DMSI-SDRi17.1Unverified
4MossFormer2SI-SDRi17Unverified
5MossFormer (L) + DMSI-SDRi16.3Unverified
6TD-Conformer (XL) + DMSI-SDRi14.6Unverified
7Improved Sudo rm -rf (U=36)SI-SDRi13.5Unverified
8TD-Conformer (L) + DMSI-SDRi13.4Unverified
9WavesplitSI-SDRi13.2Unverified
10DPTNET - SRSSNSI-SDRi12.3Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2 (w speed perturb)SI-SDRi22.2Unverified
2TF-Locoformer (M)SI-SDRi22.1Unverified
3MossFormer2 (w/o DM)SI-SDRi21.7Unverified
4Separate And DiffuseSI-SDRi21.5Unverified
5WHYVSI-SDRi17.5Unverified
6TDANet LargeSI-SDRi17.4Unverified
7TDANetSI-SDRi16.9Unverified
8Conv-Tasnet (Libri1Mix speech enhancement pre-trained)SI-SDRi14.1Unverified
9Conv-Tasnet (Libri1Mix speech enhancement multi-task)SI-SDRi13.7Unverified
10Conv-TasnetSI-SDRi13.2Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi23.7Unverified
2MossFormer2SI-SDRi22.2Unverified
3MossFormer (L) + DMSI-SDRi21.2Unverified
4Separate And DiffuseSI-SDRi20.9Unverified
5MossFormer (M) + DMSI-SDRi20.8Unverified
6SepItSI-SDRi20.1Unverified
7SepFormerSI-SDRi19.5Unverified
8SandglassetSI-SDRi17.1Unverified
9Gated DualPathRNNSI-SDRi16.85Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi16.4Unverified
2TDFNet-largeSI-SNRi15.8Unverified
3TDFNet (MHSA + Shared)SI-SNRi15Unverified
4RTFS-Net-12SI-SNRi14.9Unverified
5RTFS-Net-6SI-SNRi14.6Unverified
6CTCNetSI-SNRi14.3Unverified
7RTFS-Net-4SI-SNRi14.1Unverified
8TDFNet-smallSI-SNRi13.6Unverified
#ModelMetricClaimedVerifiedStatus
1SepReformer-L + DMSI-SDRi18.4Unverified
2MossFormer2SI-SDRi18.1Unverified
3MossFormer (L) + DMSI-SDRi17.3Unverified
4TDANet LargeSI-SDRi15.2Unverified
5TDANetSI-SDRi14.8Unverified
6WHYVSI-SDRi12.96Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi21Unverified
2Hungarian PITSI-SDRi13.22Unverified
3Conditional TasNetSI-SDRi11.7Unverified
4TasTasSI-SDRi11.14Unverified
5Gated DualPathRNNSI-SDRi10.56Unverified
6Multi-Decoder DPRNNSI-SDRi5.9Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi18.3Unverified
2RTFS-Net-12SI-SNRi17.5Unverified
3CTCNetSI-SNRi17.4Unverified
4RTFS-Net-6SI-SNRi16.9Unverified
5RTFS-Net-4SI-SNRi15.5Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi14Unverified
2RTFS-Net-12SI-SNRi12.4Unverified
3CTCNetSI-SNRi11.9Unverified
4RTFS-Net-6SI-SNRi11.8Unverified
5RTFS-Net-4SI-SNRi11.5Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi22Unverified
2Gated DualPathRNNSI-SDRi12.88Unverified
3Conditional TasNetSI-SDRi12.5Unverified
4OR-PITSI-SDRi10.2Unverified
5Multi-Decoder DPRNNSI-SDRi9.3Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi14.2Unverified
2SepItSI-SDRi13.7Unverified
3OCDSI-SDRi13.4Unverified
4Hungarian PITSI-SDRi12.72Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi9Unverified
2SepItSI-SDRi8.2Unverified
3Hungarian PITSI-SDRi7.78Unverified
#ModelMetricClaimedVerifiedStatus
1SDR9.6Unverified
2Audio-Visual concat-refSDR8.05Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi5.2Unverified
2Hungarian PITSI-SDRi4.26Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer (base)0S5.6Unverified
2Conformer (large)0S5.4Unverified
#ModelMetricClaimedVerifiedStatus
1Hungarian PITSI-SDRi5.66Unverified
#ModelMetricClaimedVerifiedStatus
1Audio-Visual concat-refSDR10.55Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2SI-SDRi20.5Unverified