SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 125 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
L-SpEx: Localized Target Speaker ExtractionCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
Selective Listening by Synchronizing Speech with LipsCode1
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
Muse: Multi-modal target speaker extraction with visual cuesCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker ExtractionCode0
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial RefinementCode0
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding InpaintingCode0
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and RestorationCode0
FlowTSE: Target Speaker Extraction with Flow Matching0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
Beamformer-Guided Target Speaker Extraction0
ExARN: self-attending RNN for target speaker extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.