SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 125 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
Muse: Multi-modal target speaker extraction with visual cuesCode1
L-SpEx: Localized Target Speaker ExtractionCode1
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
Selective Listening by Synchronizing Speech with LipsCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
FlowTSE: Target Speaker Extraction with Flow Matching0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
Beamformer-Guided Target Speaker Extraction0
ExARN: self-attending RNN for target speaker extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues0
Listen to Extract: Onset-Prompted Target Speaker Extraction0
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.