SOTAVerified

Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Showing 101150 of 359 papers

TitleStatusHype
DualSep: A Light-weight dual-encoder convolutional recurrent network for real-time in-car speech separation0
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization0
Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation0
Robustness of Speech Separation Models for Similar-pitch Speakers0
TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 20240
Knowledge boosting during low-latency inferenceCode0
Audio-Visual Approach For Multimodal Concurrent Speaker Detection0
Enhanced Deep Speech Separation in Clustered Ad Hoc Distributed Microphone Environments0
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition0
Multimodal Representation Loss Between Timed Text and Audio for Regularized Speech Separation0
Effects of Dataset Sampling Rate for Noise Cancellation through Deep Learning0
Cross-Talk Reduction0
Robust Active Speaker Detection in Noisy Environments0
Probing Self-supervised Learning Models with Target Speech Extraction0
Mixture to Mixture: Leveraging Close-talk Mixtures as Weak-supervision for Speech Separation0
Boosting Unknown-number Speaker Separation with Transformer Decoder-based Attractor0
Resource-constrained stereo singing voice cancellation0
Multi-Input Multi-Output Target-Speaker Voice Activity Detection For Unified, Flexible, and Robust Audio-Visual Speaker Diarization0
Hyperbolic Distance-Based Speech Separation0
Single-Microphone Speaker Separation and Voice Activity Detection in Noisy and Reverberant Environments0
Improving Label Assignments Learning by Dynamic Sample Dropout Combined with Layer-wise Optimization in Speech Separation0
Seeing Through the Conversation: Audio-Visual Speech Separation based on Diffusion Model0
Real-time Speech Enhancement and Separation with a Unified Deep Neural Network for Single/Dual Talker Scenarios0
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction0
GASS: Generalizing Audio Source Separation with Large-scale Data0
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization0
SPGM: Prioritizing Local Features for enhanced speech separation performanceCode0
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription0
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition0
Improving Deep Attractor Network by BGRU and GMM for Speech Separation0
Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model0
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation0
Audio-visual End-to-end Multi-channel Speech Separation, Dereverberation and Recognition0
Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction0
Mixture Encoder for Joint Speech Separation and Recognition0
Multi-Loss Convolutional Network with Time-Frequency Attention for Speech Enhancement0
An Efficient Speech Separation Network Based on Recurrent Fusion Dilated Convolution and Channel Attention0
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures0
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings0
Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation0
Speech Separation based on Contrastive Learning and Deep Modularization0
Diffusion-based Signal Refiner for Speech Separation0
AudioSlots: A slot-centric generative model for audio separation0
Deep Learning for Joint Acoustic Echo and Acoustic Howling Suppression in Hybrid Meetings0
Multi-channel Speech Separation Using Spatially Selective Deep Non-linear Filters0
On Data Sampling Strategies for Training Neural Network Speech Separation Models0
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone Conversations0
Towards Real-Time Single-Channel Speech Separation in Noisy and Reverberant Environments0
Learning-based Robust Speaker Counting and Separation with the Aid of Spatial Coherence0
Online Binaural Speech Separation of Moving Speakers With a Wavesplit Network0
Show:102550
← PrevPage 3 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1SepReformer-LSI-SDRi25.1Unverified
2TF-Locoformer (L) + DMSI-SDRi25.1Unverified
3TF-Locoformer (M) + DMSI-SDRi24.6Unverified
4TF-Locoformer (L)SI-SDRi24.2Unverified
5MossFormer2 (L)SI-SDRi24.1Unverified
6SepTDA (L=12)SI-SDRi24Unverified
7Separate And DiffuseSI-SDRi23.9Unverified
8TF-Locoformer (M)SI-SDRi23.6Unverified
9TF-Locoformer (S) + DMSI-SDRi22.8Unverified
10MossFormer (L) + DMSI-SDRi22.8Unverified
#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (M)SI-SDRi18.5Unverified
2TF-Locoformer (S)SI-SDRi17.4Unverified
3SepReformer-L + DMSI-SDRi17.1Unverified
4MossFormer2SI-SDRi17Unverified
5MossFormer (L) + DMSI-SDRi16.3Unverified
6TD-Conformer (XL) + DMSI-SDRi14.6Unverified
7Improved Sudo rm -rf (U=36)SI-SDRi13.5Unverified
8TD-Conformer (L) + DMSI-SDRi13.4Unverified
9WavesplitSI-SDRi13.2Unverified
10DPTNET - SRSSNSI-SDRi12.3Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2 (w speed perturb)SI-SDRi22.2Unverified
2TF-Locoformer (M)SI-SDRi22.1Unverified
3MossFormer2 (w/o DM)SI-SDRi21.7Unverified
4Separate And DiffuseSI-SDRi21.5Unverified
5WHYVSI-SDRi17.5Unverified
6TDANet LargeSI-SDRi17.4Unverified
7TDANetSI-SDRi16.9Unverified
8Conv-Tasnet (Libri1Mix speech enhancement pre-trained)SI-SDRi14.1Unverified
9Conv-Tasnet (Libri1Mix speech enhancement multi-task)SI-SDRi13.7Unverified
10Conv-TasnetSI-SDRi13.2Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi23.7Unverified
2MossFormer2SI-SDRi22.2Unverified
3MossFormer (L) + DMSI-SDRi21.2Unverified
4Separate And DiffuseSI-SDRi20.9Unverified
5MossFormer (M) + DMSI-SDRi20.8Unverified
6SepItSI-SDRi20.1Unverified
7SepFormerSI-SDRi19.5Unverified
8SandglassetSI-SDRi17.1Unverified
9Gated DualPathRNNSI-SDRi16.85Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi16.4Unverified
2TDFNet-largeSI-SNRi15.8Unverified
3TDFNet (MHSA + Shared)SI-SNRi15Unverified
4RTFS-Net-12SI-SNRi14.9Unverified
5RTFS-Net-6SI-SNRi14.6Unverified
6CTCNetSI-SNRi14.3Unverified
7RTFS-Net-4SI-SNRi14.1Unverified
8TDFNet-smallSI-SNRi13.6Unverified
#ModelMetricClaimedVerifiedStatus
1SepReformer-L + DMSI-SDRi18.4Unverified
2MossFormer2SI-SDRi18.1Unverified
3MossFormer (L) + DMSI-SDRi17.3Unverified
4TDANet LargeSI-SDRi15.2Unverified
5TDANetSI-SDRi14.8Unverified
6WHYVSI-SDRi12.96Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi21Unverified
2Hungarian PITSI-SDRi13.22Unverified
3Conditional TasNetSI-SDRi11.7Unverified
4TasTasSI-SDRi11.14Unverified
5Gated DualPathRNNSI-SDRi10.56Unverified
6Multi-Decoder DPRNNSI-SDRi5.9Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi18.3Unverified
2RTFS-Net-12SI-SNRi17.5Unverified
3CTCNetSI-SNRi17.4Unverified
4RTFS-Net-6SI-SNRi16.9Unverified
5RTFS-Net-4SI-SNRi15.5Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi14Unverified
2RTFS-Net-12SI-SNRi12.4Unverified
3CTCNetSI-SNRi11.9Unverified
4RTFS-Net-6SI-SNRi11.8Unverified
5RTFS-Net-4SI-SNRi11.5Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi22Unverified
2Gated DualPathRNNSI-SDRi12.88Unverified
3Conditional TasNetSI-SDRi12.5Unverified
4OR-PITSI-SDRi10.2Unverified
5Multi-Decoder DPRNNSI-SDRi9.3Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi14.2Unverified
2SepItSI-SDRi13.7Unverified
3OCDSI-SDRi13.4Unverified
4Hungarian PITSI-SDRi12.72Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi9Unverified
2SepItSI-SDRi8.2Unverified
3Hungarian PITSI-SDRi7.78Unverified
#ModelMetricClaimedVerifiedStatus
1SDR9.6Unverified
2Audio-Visual concat-refSDR8.05Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi5.2Unverified
2Hungarian PITSI-SDRi4.26Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer (base)0S5.6Unverified
2Conformer (large)0S5.4Unverified
#ModelMetricClaimedVerifiedStatus
1Hungarian PITSI-SDRi5.66Unverified
#ModelMetricClaimedVerifiedStatus
1Audio-Visual concat-refSDR10.55Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2SI-SDRi20.5Unverified