SOTAVerified

Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Showing 251300 of 359 papers

TitleStatusHype
Deep Ad-hoc Beamforming Based on Speaker Extraction for Target-Dependent Speech Separation0
Audio-visual Speech Separation with Adversarially Disentangled Visual Representation0
Multi-Decoder DPRNN: High Accuracy Source Counting and SeparationCode0
Ultra-Lightweight Speech Separation via Group Communication0
WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and DereverberationCode0
Block-Online Guided Source Separation0
Audio-visual Multi-channel Integration and Recognition of Overlapped Speech0
Surrogate Source Model Learning for Determined Source Separation0
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments0
ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration0
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis0
Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain0
X-TaSNet: Robust and Accurate Time-Domain Speaker Extraction Network0
Speech enhancement aided end-to-end multi-task learning for voice activity detection0
Towards Listening to 10 People Simultaneously: An Efficient Permutation Invariant Training of Audio Source Separation Using Sinkhorn's Algorithm0
BERT for Joint Multichannel Speech Dereverberation with Spatial-aware Tasks0
X-DC: Explainable Deep Clustering based on Learnable Spectrogram Templates0
An End-to-end Architecture of Online Multi-channel Speech Separation0
Independent Vector Analysis via Log-Quadratically Penalized Quadratic Minimization0
Deep Variational Generative Models for Audio-visual Speech Separation0
ADL-MVDR: All deep learning MVDR beamformer for target speech separationCode0
Efficient Integration of Multi-channel Information for Speaker-independent Speech Separation0
MIRNet: Learning multiple identities representations in overlapped speech0
CSLNSpeech: solving extended speech separation problem with the help of Chinese sign languageCode0
Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks0
Exploring the time-domain deep attractor network with two-stream architectures in a reverberant environment0
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals0
Unsupervised Sound Separation Using Mixture Invariant Training0
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR0
Identify Speakers in Cocktail Parties with End-to-End AttentionCode0
Audio-visual Multi-channel Recognition of Overlapped Speech0
FaceFilter: Audio-visual speech separation using still images0
Neural Speech Separation Using Spatially Distributed Microphones0
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings0
Simultaneous Denoising and Dereverberation Using Deep Embedding Features0
Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method0
Multi-modal Multi-channel Target Speech Separation0
Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning0
Wavesplit: End-to-End Speech Separation by Speaker Clustering0
Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features0
Audio-visual Recognition of Overlapped speech for the LRS2 dataset0
Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation0
Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation0
End-to-end training of time domain audio separation and recognition0
A Unified Framework for Speech Separation0
Advances in Online Audio-Visual Meeting Transcription0
MITAS: A Compressed Time-Domain Audio Separation Network with Parameter Sharing0
Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven Cameras0
Improving Voice Separation by Incorporating End-to-end Speech RecognitionCode0
Demystifying TasNet: A Dissecting Approach0
Show:102550
← PrevPage 6 of 8Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (L) + DMSI-SDRi25.1Unverified
2SepReformer-LSI-SDRi25.1Unverified
3TF-Locoformer (M) + DMSI-SDRi24.6Unverified
4TF-Locoformer (L)SI-SDRi24.2Unverified
5MossFormer2 (L)SI-SDRi24.1Unverified
6SepTDA (L=12)SI-SDRi24Unverified
7Separate And DiffuseSI-SDRi23.9Unverified
8TF-Locoformer (M)SI-SDRi23.6Unverified
9MossFormer (L) + DMSI-SDRi22.8Unverified
10TF-Locoformer (S) + DMSI-SDRi22.8Unverified
#ModelMetricClaimedVerifiedStatus
1TF-Locoformer (M)SI-SDRi18.5Unverified
2TF-Locoformer (S)SI-SDRi17.4Unverified
3SepReformer-L + DMSI-SDRi17.1Unverified
4MossFormer2SI-SDRi17Unverified
5MossFormer (L) + DMSI-SDRi16.3Unverified
6TD-Conformer (XL) + DMSI-SDRi14.6Unverified
7Improved Sudo rm -rf (U=36)SI-SDRi13.5Unverified
8TD-Conformer (L) + DMSI-SDRi13.4Unverified
9WavesplitSI-SDRi13.2Unverified
10DPTNET - SRSSNSI-SDRi12.3Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2 (w speed perturb)SI-SDRi22.2Unverified
2TF-Locoformer (M)SI-SDRi22.1Unverified
3MossFormer2 (w/o DM)SI-SDRi21.7Unverified
4Separate And DiffuseSI-SDRi21.5Unverified
5WHYVSI-SDRi17.5Unverified
6TDANet LargeSI-SDRi17.4Unverified
7TDANetSI-SDRi16.9Unverified
8Conv-Tasnet (Libri1Mix speech enhancement pre-trained)SI-SDRi14.1Unverified
9Conv-Tasnet (Libri1Mix speech enhancement multi-task)SI-SDRi13.7Unverified
10Conv-TasnetSI-SDRi13.2Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi23.7Unverified
2MossFormer2SI-SDRi22.2Unverified
3MossFormer (L) + DMSI-SDRi21.2Unverified
4Separate And DiffuseSI-SDRi20.9Unverified
5MossFormer (M) + DMSI-SDRi20.8Unverified
6SepItSI-SDRi20.1Unverified
7SepFormerSI-SDRi19.5Unverified
8SandglassetSI-SDRi17.1Unverified
9Gated DualPathRNNSI-SDRi16.85Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi16.4Unverified
2TDFNet-largeSI-SNRi15.8Unverified
3TDFNet (MHSA + Shared)SI-SNRi15Unverified
4RTFS-Net-12SI-SNRi14.9Unverified
5RTFS-Net-6SI-SNRi14.6Unverified
6CTCNetSI-SNRi14.3Unverified
7RTFS-Net-4SI-SNRi14.1Unverified
8TDFNet-smallSI-SNRi13.6Unverified
#ModelMetricClaimedVerifiedStatus
1SepReformer-L + DMSI-SDRi18.4Unverified
2MossFormer2SI-SDRi18.1Unverified
3MossFormer (L) + DMSI-SDRi17.3Unverified
4TDANet LargeSI-SDRi15.2Unverified
5TDANetSI-SDRi14.8Unverified
6WHYVSI-SDRi12.96Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi21Unverified
2Hungarian PITSI-SDRi13.22Unverified
3Conditional TasNetSI-SDRi11.7Unverified
4TasTasSI-SDRi11.14Unverified
5Gated DualPathRNNSI-SDRi10.56Unverified
6Multi-Decoder DPRNNSI-SDRi5.9Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi18.3Unverified
2RTFS-Net-12SI-SNRi17.5Unverified
3CTCNetSI-SNRi17.4Unverified
4RTFS-Net-6SI-SNRi16.9Unverified
5RTFS-Net-4SI-SNRi15.5Unverified
#ModelMetricClaimedVerifiedStatus
1IIANetSI-SNRi14Unverified
2RTFS-Net-12SI-SNRi12.4Unverified
3CTCNetSI-SNRi11.9Unverified
4RTFS-Net-6SI-SNRi11.8Unverified
5RTFS-Net-4SI-SNRi11.5Unverified
#ModelMetricClaimedVerifiedStatus
1SepTDASI-SDRi22Unverified
2Gated DualPathRNNSI-SDRi12.88Unverified
3Conditional TasNetSI-SDRi12.5Unverified
4OR-PITSI-SDRi10.2Unverified
5Multi-Decoder DPRNNSI-SDRi9.3Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi14.2Unverified
2SepItSI-SDRi13.7Unverified
3OCDSI-SDRi13.4Unverified
4Hungarian PITSI-SDRi12.72Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi9Unverified
2SepItSI-SDRi8.2Unverified
3Hungarian PITSI-SDRi7.78Unverified
#ModelMetricClaimedVerifiedStatus
1SDR9.6Unverified
2Audio-Visual concat-refSDR8.05Unverified
#ModelMetricClaimedVerifiedStatus
1Separate And DiffuseSI-SDRi5.2Unverified
2Hungarian PITSI-SDRi4.26Unverified
#ModelMetricClaimedVerifiedStatus
1Conformer (base)0S5.6Unverified
2Conformer (large)0S5.4Unverified
#ModelMetricClaimedVerifiedStatus
1Hungarian PITSI-SDRi5.66Unverified
#ModelMetricClaimedVerifiedStatus
1Audio-Visual concat-refSDR10.55Unverified
#ModelMetricClaimedVerifiedStatus
1MossFormer2SI-SDRi20.5Unverified