Speech Separation

The task of extracting all overlapping speech sources in a given mixed speech signal refers to the Speech Separation. Speech Separation is a special scenario of source separation problem, where the focus is only on the overlapping speech signal sources and other interferences such as music or noise signals are not the main concern of the study. A recent representative Github project can be referred to ClearerVoice-Studio.

Source: A Unified Framework for Speech Separation

Image credit: Speech Separation of A Target Speaker Based on Deep Neural Networks

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 359 papers

Title	Date	Tasks	Status	Hype
Dynamic Slimmable Networks for Efficient Speech Separation	Jul 8, 2025	Speech Separation	—Unverified	0
Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios	Jun 17, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline	May 25, 2025	Speech ExtractionSpeech Separation	CodeCode Available	3
Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers	May 22, 2025	Speech Separation	—Unverified	0
Single-Channel Target Speech Extraction Utilizing Distance and Room Clues	May 20, 2025	Speech ExtractionSpeech Separation	—Unverified	0
Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation	May 19, 2025	Speech Separation	—Unverified	0
SepPrune: Structured Pruning for Efficient Deep Speech Separation	May 17, 2025	channel selectionComputational Efficiency	CodeCode Available	1
A Survey of Deep Learning for Complex Speech Spectrograms	May 13, 2025	Deep LearningSpeech Enhancement	—Unverified	0
ArrayDPS: Unsupervised Blind Speech Separation with a Diffusion Prior	May 8, 2025	Room Impulse Response (RIR)Speech Separation	CodeCode Available	1
SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer	May 7, 2025	Audio-Visual Speech RecognitionLip Reading	—Unverified	0

Show:10 25 50

← PrevPage 1 of 36Next →

All datasets WSJ0-2mix WHAMR!Libri2Mix WSJ0-3mix LRS2 WHAM!WSJ0-5mix LRS3 VoxCeleb2 WSJ0-4mix Libri5Mix Libri10Mix

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	SepTDA	SI-SDRi	23.7	—	Unverified
2	MossFormer2	SI-SDRi	22.2	—	Unverified
3	MossFormer (L) + DM	SI-SDRi	21.2	—	Unverified
4	Separate And Diffuse	SI-SDRi	20.9	—	Unverified
5	MossFormer (M) + DM	SI-SDRi	20.8	—	Unverified
6	SepIt	SI-SDRi	20.1	—	Unverified
7	SepFormer	SI-SDRi	19.5	—	Unverified
8	Sandglasset	SI-SDRi	17.1	—	Unverified
9	Gated DualPathRNN	SI-SDRi	16.85	—	Unverified