SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 150 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
Muse: Multi-modal target speaker extraction with visual cuesCode1
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
Selective Listening by Synchronizing Speech with LipsCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
L-SpEx: Localized Target Speaker ExtractionCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial RefinementCode0
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker ExtractionCode0
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding InpaintingCode0
Multimodal Attention Fusion for Target Speaker Extraction0
Multi-Talker MVDR Beamforming Based on Extended Complex Gaussian Mixture Model0
New Insights on Target Speaker Extraction0
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings0
Semi-supervised Time Domain Target Speaker Extraction with Attention0
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling0
Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells0
Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer0
STCON System for the CHiME-8 Challenge0
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
Target Speaker Extraction with Curriculum Learning0
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Wanna hear your voice? A sample is all we need!0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction0
Beamformer-Guided Target Speaker Extraction0
Binaural Selective Attention Model for Target Speaker Extraction0
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers0
Conditional Diffusion Model for Target Speaker Extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
ExARN: self-attending RNN for target speaker extraction0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
FlowTSE: Target Speaker Extraction with Flow Matching0
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration0
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.