SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 125 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
L-SpEx: Localized Target Speaker ExtractionCode1
Selective Listening by Synchronizing Speech with LipsCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
Muse: Multi-modal target speaker extraction with visual cuesCode1
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction0
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker ExtractionCode0
FlowTSE: Target Speaker Extraction with Flow Matching0
Listen to Extract: Onset-Prompted Target Speaker Extraction0
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.