SOTAVerified

Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Showing 151200 of 359 papers

TitleStatusHype
Ultra Fast Speech Separation Model with Teacher Student Learning0
Heterogeneous Separation Consistency Training for Adaptation of Unsupervised Speech Separation0
RadioSES: mmWave-Based Audioradio Speech Enhancement and Separation System0
Heterogeneous Target Speech Separation0
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation0
Audio-visual multi-channel speech separation, dereverberation and recognition0
Low-Latency Speech Separation Guided Diarization for Telephone ConversationsCode1
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches0
Speaker Extraction with Co-Speech Gestures CueCode0
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersCode0
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers0
Disentangling the Impacts of Language and Channel Variability on Speech Separation NetworksCode0
Remix-cycle-consistent Learning on Adversarially Learned Separator for Accurate and Stable Unsupervised Speech Separation0
Embedding Recurrent Layers with Dual-Path Strategy in a Variant of Convolutional Network for Speaker-Independent Speech Separation0
Investigating self-supervised learning for speech enhancement and separation0
VoViT: Low Latency Graph-based Audio-Visual Voice Separation TransformerCode1
Harmonicity Plays a Critical Role in DNN Based Versus in Biologically-Inspired Monaural Speech Segregation Systems0
Audio-visual speech separation based on joint feature representation with cross-modal attention0
Royalflush Speaker Diarization System for ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge0
MixCycle: Unsupervised Speech Separation via Cyclic Mixture Permutation Invariant TrainingCode1
Exploring Self-Attention Mechanisms for Speech SeparationCode0
The RoyalFlush System of Speech Recognition for M2MeT Challenge0
SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation0
Endpoint Detection for Streaming End-to-End Multi-talker ASR0
DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And ExtractionCode1
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem0
Directed Speech Separation for Automatic Speech Recognition of Long Form Conversational Speech0
A Time-domain Real-valued Generalized Wiener Filter for Multi-channel Neural Separation SystemsCode1
Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature0
A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation0
Single-channel speech separation using Soft-minimum Permutation Invariant Training0
Inter-channel Conv-TasNet for multichannel speech enhancement0
LiMuSE: Lightweight Multi-modal Speaker ExtractionCode1
Continuous Speech Separation with Recurrent Selective Attention Network0
Separating Long-Form Speech with Group-Wise Permutation Invariant Training0
REAL-M: Towards Speech Separation on Real MixturesCode0
Progressive Learning for Stabilizing Label Selection in Speech Separation with Mapping-based Method0
All-neural beamformer for continuous speech separation0
VarArray: Array-Geometry-Agnostic Continuous Speech Separation0
Stepwise-Refining Speech Separation Network via Fine-Grained Encoding in High-order Latent Domain0
North America Bixby Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 20210
Continuous Streaming Multi-Talker ASR with Dual-path Transducers0
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary ModelCode0
Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakersCode1
Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning0
Multi-Task Audio Source SeparationCode1
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio0
Separation Guided Speaker Diarization in Realistic Mismatched Conditions0
Investigation of Practical Aspects of Single Channel Speech Separation for ASR0
Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits0
Show:102550
← PrevPage 4 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (L) + DMSI-SDRi25.1Unverified
2SepReformer-LSI-SDRi25.1Unverified
3TF-Locoformer (M) + DMSI-SDRi24.6Unverified
4TF-Locoformer (L)SI-SDRi24.2Unverified
5MossFormer2 (L)SI-SDRi24.1Unverified
6SepTDA (L=12)SI-SDRi24Unverified
7Separate And DiffuseSI-SDRi23.9Unverified
8TF-Locoformer (M)SI-SDRi23.6Unverified
9MossFormer (L) + DMSI-SDRi22.8Unverified
10TF-Locoformer (S) + DMSI-SDRi22.8Unverified
#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (M)SI-SDRi18.5Unverified
2TF-Locoformer (S)SI-SDRi17.4Unverified
3SepReformer-L + DMSI-SDRi17.1Unverified
4MossFormer2SI-SDRi17Unverified
5MossFormer (L) + DMSI-SDRi16.3Unverified
6TD-Conformer (XL) + DMSI-SDRi14.6Unverified
7Improved Sudo rm -rf (U=36)SI-SDRi13.5Unverified
8TD-Conformer (L) + DMSI-SDRi13.4Unverified
9WavesplitSI-SDRi13.2Unverified
10DPTNET - SRSSNSI-SDRi12.3Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2 (w speed perturb)SI-SDRi22.2Unverified
2TF-Locoformer (M)SI-SDRi22.1Unverified
3MossFormer2 (w/o DM)SI-SDRi21.7Unverified
4Separate And DiffuseSI-SDRi21.5Unverified
5WHYVSI-SDRi17.5Unverified
6TDANet LargeSI-SDRi17.4Unverified
7TDANetSI-SDRi16.9Unverified
8Conv-Tasnet (Libri1Mix speech enhancement pre-trained)SI-SDRi14.1Unverified
9Conv-Tasnet (Libri1Mix speech enhancement multi-task)SI-SDRi13.7Unverified
10Conv-TasnetSI-SDRi13.2Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi23.7Unverified
2MossFormer2SI-SDRi22.2Unverified
3MossFormer (L) + DMSI-SDRi21.2Unverified
4Separate And DiffuseSI-SDRi20.9Unverified
5MossFormer (M) + DMSI-SDRi20.8Unverified
6SepItSI-SDRi20.1Unverified
7SepFormerSI-SDRi19.5Unverified
8SandglassetSI-SDRi17.1Unverified
9Gated DualPathRNNSI-SDRi16.85Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi16.4Unverified
2TDFNet-largeSI-SNRi15.8Unverified
3TDFNet (MHSA + Shared)SI-SNRi15Unverified
4RTFS-Net-12SI-SNRi14.9Unverified
5RTFS-Net-6SI-SNRi14.6Unverified
6CTCNetSI-SNRi14.3Unverified
7RTFS-Net-4SI-SNRi14.1Unverified
8TDFNet-smallSI-SNRi13.6Unverified
#ModelMetricClaimedVerifiedStatus
1SepReformer-L + DMSI-SDRi18.4Unverified
2MossFormer2SI-SDRi18.1Unverified
3MossFormer (L) + DMSI-SDRi17.3Unverified
4TDANet LargeSI-SDRi15.2Unverified
5TDANetSI-SDRi14.8Unverified
6WHYVSI-SDRi12.96Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi21Unverified
2Hungarian PITSI-SDRi13.22Unverified
3Conditional TasNetSI-SDRi11.7Unverified
4TasTasSI-SDRi11.14Unverified
5Gated DualPathRNNSI-SDRi10.56Unverified
6Multi-Decoder DPRNNSI-SDRi5.9Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi18.3Unverified
2RTFS-Net-12SI-SNRi17.5Unverified
3CTCNetSI-SNRi17.4Unverified
4RTFS-Net-6SI-SNRi16.9Unverified
5RTFS-Net-4SI-SNRi15.5Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi14Unverified
2RTFS-Net-12SI-SNRi12.4Unverified
3CTCNetSI-SNRi11.9Unverified
4RTFS-Net-6SI-SNRi11.8Unverified
5RTFS-Net-4SI-SNRi11.5Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi22Unverified
2Gated DualPathRNNSI-SDRi12.88Unverified
3Conditional TasNetSI-SDRi12.5Unverified
4OR-PITSI-SDRi10.2Unverified
5Multi-Decoder DPRNNSI-SDRi9.3Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi14.2Unverified
2SepItSI-SDRi13.7Unverified
3OCDSI-SDRi13.4Unverified
4Hungarian PITSI-SDRi12.72Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi9Unverified
2SepItSI-SDRi8.2Unverified
3Hungarian PITSI-SDRi7.78Unverified
#ModelMetricClaimedVerifiedStatus
1SDR9.6Unverified
2Audio-Visual concat-refSDR8.05Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi5.2Unverified
2Hungarian PITSI-SDRi4.26Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer (base)0S5.6Unverified
2Conformer (large)0S5.4Unverified
#ModelMetricClaimedVerifiedStatus
1Hungarian PITSI-SDRi5.66Unverified
#ModelMetricClaimedVerifiedStatus
1Audio-Visual concat-refSDR10.55Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2SI-SDRi20.5Unverified