Target Sound Extraction

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–16 of 16 papers

Title	Date	Tasks	Status	Hype
Real-Time Target Sound Extraction	Nov 4, 2022	DecoderStreaming Target Sound Extraction	CodeCode Available	2
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer	Sep 12, 2024	Target Sound Extraction	CodeCode Available	2
Target Sound Extraction with Variable Cross-modality Clues	Mar 15, 2023	AudioCapsTarget Sound Extraction	CodeCode Available	1
Cross-attention Inspired Selective State Space Models for Target Sound Extraction	Sep 7, 2024	Computational EfficiencyMamba	CodeCode Available	1
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction	Oct 6, 2023	Target Sound Extraction	CodeCode Available	1
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables	Nov 1, 2023	Target Sound Extraction	CodeCode Available	1
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction	Feb 27, 2024	Target Sound Extraction	CodeCode Available	1
SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction	May 30, 2025	Image SegmentationSemantic Segmentation	—Unverified	0
Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues	Sep 19, 2024	Inductive BiasTarget Sound Extraction	—Unverified	0
CATSE: A Context-Aware Framework for Causal Target Sound Extraction	Mar 21, 2024	Target Sound Extraction	—Unverified	0
Few-shot learning of new sound classes for target sound extraction	Jun 14, 2021	Few-Shot LearningTarget Sound Extraction	—Unverified	0
Language-Queried Target Sound Extraction Without Parallel Training Data	Sep 14, 2024	Language ModellingLarge Language Model	—Unverified	0
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction	Sep 20, 2024	Target Sound Extraction	—Unverified	0
Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction	Dec 27, 2023	blind source separationTarget Sound Extraction	—Unverified	0
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning	Apr 8, 2022	Target Sound Extraction	—Unverified	0
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?	Jul 22, 2024	AllTarget Sound Extraction	CodeCode Available	0

Show:10 25 50

All datasets AudioCaps AudioSet FSDSoundScapes

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	CLAPSep	SDRi	10.08	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CLAPSep	SDRi	9.29	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Waveformer	SI-SNRi	9.43	—	Unverified