SOTAVerified

Target Sound Extraction

Target Sound Extraction is the task of extracting a sound corresponding to a given class from an audio mixture. The audio mixture may contain background noise with a relatively low amplitude compared to the foreground mixture components. The choice of the sound class is provided as input to the model in form of a string, integer, or a one-hot encoding of the sound class.

Papers

Showing 110 of 16 papers

TitleStatusHype
SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction0
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction0
Multichannel-to-Multichannel Target Sound Extraction Using Direction and Timestamp Clues0
Language-Queried Target Sound Extraction Without Parallel Training Data0
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion TransformerCode2
Cross-attention Inspired Selective State Space Models for Target Sound ExtractionCode1
Can all variations within the unified mask-based beamformer framework achieve identical peak extraction performance?Code0
CATSE: A Context-Aware Framework for Causal Target Sound Extraction0
CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound ExtractionCode1
Online Similarity-and-Independence-Aware Beamformer for Low-latency Target Sound Extraction0
Show:102550
← PrevPage 1 of 2Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CLAPSepSDRi10.08Unverified
#ModelMetricClaimedVerifiedStatus
1CLAPSepSDRi9.29Unverified
#ModelMetricClaimedVerifiedStatus
1WaveformerSI-SNRi9.43Unverified