SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 150 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
L-SpEx: Localized Target Speaker ExtractionCode1
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
Muse: Multi-modal target speaker extraction with visual cuesCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
Selective Listening by Synchronizing Speech with LipsCode1
MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues0
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge0
Multimodal Attention Fusion for Target Speaker Extraction0
Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture Model0
New Insights on Target Speaker Extraction0
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings0
Semi-supervised Time Domain Target Speaker Extraction with Attention0
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling0
Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells0
Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer0
STCON System for the CHiME-8 Challenge0
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
Target Speaker Extraction with Curriculum Learning0
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction0
Beamformer-Guided Target Speaker Extraction0
Binaural Selective Attention Model for Target Speaker Extraction0
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers0
Conditional Diffusion Model for Target Speaker Extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
ExARN: self-attending RNN for target speaker extraction0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
FlowTSE: Target Speaker Extraction with Flow Matching0
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings0
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction0
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives0
Listen to Extract: Onset-Prompted Target Speaker Extraction0
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.