SOTAVerified

Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Showing 251300 of 359 papers

TitleStatusHype
Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method0
Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism0
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation0
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition0
Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm0
Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments0
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition0
Tune-In: Training Under Negative Environments with Interference for Attention Networks Simulating Cocktail Party Effect0
Ultra Fast Speech Separation Model with Teacher Student Learning0
Ultra-Lightweight Speech Separation via Group Communication0
U-Mamba-Net: A highly efficient Mamba-based U-net style network for noisy and reverberant speech separation0
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures0
Unsupervised Sound Separation Using Mixture Invariant Training0
Using Optimal Ratio Mask as Training Target for Supervised Speech Separation0
USTC-NELSLIP System Description for DIHARD-III Challenge0
Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation0
VarArray: Array-Geometry-Agnostic Continuous Speech Separation0
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition0
Wanna hear your voice? A sample is all we need!0
Wavesplit: End-to-End Speech Separation by Speaker Clustering0
X-DC: Explainable Deep Clustering based on Learnable Spectrogram Templates0
Universal Sound Separation0
Probabilistic Permutation Invariant Training for Speech Separation0
Probing Self-supervised Learning Models with Target Speech Extraction0
Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition0
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks0
Provable Subspace Identification Under Post-Nonlinear Mixtures0
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System0
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation0
Real-time Speech Enhancement and Separation with a Unified Deep Neural Network for Single/Dual Talker Scenarios0
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks0
Recycling an anechoic pre-trained speech separation deep neural network for binaural dereverberation of a single source0
Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation0
Resource-constrained stereo singing voice cancellation0
Reverberation as Supervision for Speech Separation0
Robust Active Speaker Detection in Noisy Environments0
Robustness of Speech Separation Models for Similar-pitch Speakers0
Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge0
SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate0
Scaling strategies for on-device low-complexity source separation with Conv-Tasnet0
SCA: Streaming Cross-attention Alignment for Echo Cancellation0
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model0
Self-Remixing: Unsupervised Speech Separation via Separation and Remixing0
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation0
Separate And Diffuse: Using a Pretrained Diffusion Model for Improving Source Separation0
Separating Long-Form Speech with Group-Wise Permutation Invariant Training0
Separation Guided Speaker Diarization in Realistic Mismatched Conditions0
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech0
SepIt: Approaching a Single Channel Speech Separation Bound0
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals0
Show:102550
← PrevPage 6 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (L) + DMSI-SDRi25.1Unverified
2SepReformer-LSI-SDRi25.1Unverified
3TF-Locoformer (M) + DMSI-SDRi24.6Unverified
4TF-Locoformer (L)SI-SDRi24.2Unverified
5MossFormer2 (L)SI-SDRi24.1Unverified
6SepTDA (L=12)SI-SDRi24Unverified
7Separate And DiffuseSI-SDRi23.9Unverified
8TF-Locoformer (M)SI-SDRi23.6Unverified
9MossFormer (L) + DMSI-SDRi22.8Unverified
10TF-Locoformer (S) + DMSI-SDRi22.8Unverified
#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (M)SI-SDRi18.5Unverified
2TF-Locoformer (S)SI-SDRi17.4Unverified
3SepReformer-L + DMSI-SDRi17.1Unverified
4MossFormer2SI-SDRi17Unverified
5MossFormer (L) + DMSI-SDRi16.3Unverified
6TD-Conformer (XL) + DMSI-SDRi14.6Unverified
7Improved Sudo rm -rf (U=36)SI-SDRi13.5Unverified
8TD-Conformer (L) + DMSI-SDRi13.4Unverified
9WavesplitSI-SDRi13.2Unverified
10DPTNET - SRSSNSI-SDRi12.3Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2 (w speed perturb)SI-SDRi22.2Unverified
2TF-Locoformer (M)SI-SDRi22.1Unverified
3MossFormer2 (w/o DM)SI-SDRi21.7Unverified
4Separate And DiffuseSI-SDRi21.5Unverified
5WHYVSI-SDRi17.5Unverified
6TDANet LargeSI-SDRi17.4Unverified
7TDANetSI-SDRi16.9Unverified
8Conv-Tasnet (Libri1Mix speech enhancement pre-trained)SI-SDRi14.1Unverified
9Conv-Tasnet (Libri1Mix speech enhancement multi-task)SI-SDRi13.7Unverified
10Conv-TasnetSI-SDRi13.2Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi23.7Unverified
2MossFormer2SI-SDRi22.2Unverified
3MossFormer (L) + DMSI-SDRi21.2Unverified
4Separate And DiffuseSI-SDRi20.9Unverified
5MossFormer (M) + DMSI-SDRi20.8Unverified
6SepItSI-SDRi20.1Unverified
7SepFormerSI-SDRi19.5Unverified
8SandglassetSI-SDRi17.1Unverified
9Gated DualPathRNNSI-SDRi16.85Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi16.4Unverified
2TDFNet-largeSI-SNRi15.8Unverified
3TDFNet (MHSA + Shared)SI-SNRi15Unverified
4RTFS-Net-12SI-SNRi14.9Unverified
5RTFS-Net-6SI-SNRi14.6Unverified
6CTCNetSI-SNRi14.3Unverified
7RTFS-Net-4SI-SNRi14.1Unverified
8TDFNet-smallSI-SNRi13.6Unverified
#ModelMetricClaimedVerifiedStatus
1SepReformer-L + DMSI-SDRi18.4Unverified
2MossFormer2SI-SDRi18.1Unverified
3MossFormer (L) + DMSI-SDRi17.3Unverified
4TDANet LargeSI-SDRi15.2Unverified
5TDANetSI-SDRi14.8Unverified
6WHYVSI-SDRi12.96Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi21Unverified
2Hungarian PITSI-SDRi13.22Unverified
3Conditional TasNetSI-SDRi11.7Unverified
4TasTasSI-SDRi11.14Unverified
5Gated DualPathRNNSI-SDRi10.56Unverified
6Multi-Decoder DPRNNSI-SDRi5.9Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi18.3Unverified
2RTFS-Net-12SI-SNRi17.5Unverified
3CTCNetSI-SNRi17.4Unverified
4RTFS-Net-6SI-SNRi16.9Unverified
5RTFS-Net-4SI-SNRi15.5Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi14Unverified
2RTFS-Net-12SI-SNRi12.4Unverified
3CTCNetSI-SNRi11.9Unverified
4RTFS-Net-6SI-SNRi11.8Unverified
5RTFS-Net-4SI-SNRi11.5Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi22Unverified
2Gated DualPathRNNSI-SDRi12.88Unverified
3Conditional TasNetSI-SDRi12.5Unverified
4OR-PITSI-SDRi10.2Unverified
5Multi-Decoder DPRNNSI-SDRi9.3Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi14.2Unverified
2SepItSI-SDRi13.7Unverified
3OCDSI-SDRi13.4Unverified
4Hungarian PITSI-SDRi12.72Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi9Unverified
2SepItSI-SDRi8.2Unverified
3Hungarian PITSI-SDRi7.78Unverified
#ModelMetricClaimedVerifiedStatus
1SDR9.6Unverified
2Audio-Visual concat-refSDR8.05Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi5.2Unverified
2Hungarian PITSI-SDRi4.26Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer (base)0S5.6Unverified
2Conformer (large)0S5.4Unverified
#ModelMetricClaimedVerifiedStatus
1Hungarian PITSI-SDRi5.66Unverified
#ModelMetricClaimedVerifiedStatus
1Audio-Visual concat-refSDR10.55Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2SI-SDRi20.5Unverified