SOTAVerified

Target Speaker Extraction

Extract the dialogue content of the specified target in a multi-person dialogue.

Papers

Showing 2650 of 55 papers

TitleStatusHype
Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures0
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer0
STCON System for the CHiME-8 Challenge0
Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain0
Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments0
Target Speaker Extraction with Curriculum Learning0
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction0
Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions0
Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection0
Wanna hear your voice? A sample is all we need!0
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning0
AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement0
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction0
Beamformer-Guided Target Speaker Extraction0
Binaural Selective Attention Model for Target Speaker Extraction0
C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction0
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers0
Conditional Diffusion Model for Target Speaker Extraction0
Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training0
ExARN: self-attending RNN for target speaker extraction0
Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction0
FlowTSE: Target Speaker Extraction with Flow Matching0
Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings0
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction0
Listening to Multi-talker Conversations: Modular and End-to-end Perspectives0
Show:102550
← PrevPage 2 of 3Next →

No leaderboard results yet.