SOTAVerified

Target Sound Extraction

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Papers

Showing 116 of 16 papers

TitleStatusHype
Real-Time Target Sound ExtractionCode2
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion TransformerCode2
Target Sound Extraction with Variable Cross-modality CluesCode1
Cross-attention Inspired Selective State Space Models for Target Sound ExtractionCode1
DPM-TSE: A Diffusion Probabilistic Model for Target Sound ExtractionCode1
Semantic Hearing: Programming Acoustic Scenes with Binaural HearablesCode1
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound ExtractionCode1
SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction0
Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues0
CATSE: A Context-Aware Framework for Causal Target Sound Extraction0
Few-shot learning of new sound classes for target sound extraction0
Language-Queried Target Sound Extraction Without Parallel Training Data0
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction0
Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction0
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning0
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?Code0
Show:102550

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CLAPSepSDRi10.08Unverified
#ModelMetricClaimedVerifiedStatus
1CLAPSepSDRi9.29Unverified
#ModelMetricClaimedVerifiedStatus
1WaveformerSI-SNRi9.43Unverified