SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 125 of 55 papers

TitleStatusHype
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction0
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker ExtractionCode0
FlowTSE: Target Speaker Extraction with Flow Matching0
Listen to Extract: Onset-Prompted Target Speaker Extraction0
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues0
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
STCON System for the CHiME-8 Challenge0
Wanna hear your voice? A sample is all we need!0
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions0
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and RestorationCode0
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial RefinementCode0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling0
Binaural Selective Attention Model for Target Speaker Extraction0
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
Target Speaker Extraction with Curriculum Learning0
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.