SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 150 of 55 papers

TitleStatusHype
Metis: A Foundation Speech Generation Model with Masked Generative Pre-trainingCode9
Multi-Level Speaker Representation for Target Speaker ExtractionCode3
WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker ExtractionCode3
TSELM: Target Speaker Extraction using Discrete Tokens and Language ModelsCode2
Audio-Visual Target Speaker Extraction with Reverse Selective Auditory AttentionCode1
L-SpEx: Localized Target Speaker ExtractionCode1
GPU-accelerated Guided Source Separation for Meeting TranscriptionCode1
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionCode1
AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band ModelingCode1
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionCode1
RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech SeparationCode1
A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker ExtractionCode1
Selective Listening by Synchronizing Speech with LipsCode1
Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker SpeechCode1
Muse: Multi-modal target speaker extraction with visual cuesCode1
LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language ModelsCode1
ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding InpaintingCode0
Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial RefinementCode0
M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker ExtractionCode0
Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and RestorationCode0
New Insights on Target Speaker Extraction0
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings0
Semi-supervised Time Domain Target Speaker Extraction with Attention0
SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling0
Speaker-conditioned Target Speaker Extraction based on Customized LSTM Cells0
Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer0
STCON System for the CHiME-8 Challenge0
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
Target Speaker Extraction with Curriculum Learning0
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Wanna hear your voice? A sample is all we need!0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction0
Beamformer-Guided Target Speaker Extraction0
Binaural Selective Attention Model for Target Speaker Extraction0
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers0
Conditional Diffusion Model for Target Speaker Extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
ExARN: self-attending RNN for target speaker extraction0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
FlowTSE: Target Speaker Extraction with Flow Matching0
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings0
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction0
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.