| Metis: A Foundation Speech Generation Model with Masked Generative Pre-training | Feb 5, 2025 | Self-Supervised LearningSpeech Enhancement | CodeCode Available | 9 |
| Multi-Level Speaker Representation for Target Speaker Extraction | Oct 21, 2024 | Target Speaker Extraction | CodeCode Available | 3 |
| WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction | Sep 24, 2024 | Managementspeech-recognition | CodeCode Available | 3 |
| TSELM: Target Speaker Extraction using Discrete Tokens and Language Models | Sep 12, 2024 | Audio GenerationTarget Speaker Extraction | CodeCode Available | 2 |
| LauraTSE: Target Speaker Extraction using Auto-Regressive Decoder-Only Language Models | Apr 10, 2025 | DecoderLanguage Modeling | CodeCode Available | 1 |
| USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction | Sep 4, 2024 | Speaker RecognitionSpeech Separation | CodeCode Available | 1 |
| AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling | Jun 17, 2024 | Speaker SeparationSpeech Enhancement | CodeCode Available | 1 |
| Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention | Apr 29, 2024 | Target Speaker Extraction | CodeCode Available | 1 |
| Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction | Oct 11, 2023 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| RTFS-Net: Recurrent Time-Frequency Modelling for Efficient Audio-Visual Speech Separation | Sep 29, 2023 | Audio-Visual Speech Recognitionspeech-recognition | CodeCode Available | 1 |
| GPU-accelerated Guided Source Separation for Meeting Transcription | Dec 10, 2022 | blind source separationCPU | CodeCode Available | 1 |
| A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction | Mar 31, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| L-SpEx: Localized Target Speaker Extraction | Feb 21, 2022 | Target Speaker Extraction | CodeCode Available | 1 |
| Selective Listening by Synchronizing Speech with Lips | Jun 14, 2021 | Lip ReadingTarget Speaker Extraction | CodeCode Available | 1 |
| Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech | Mar 30, 2021 | Multi-Task LearningSpeaker Verification | CodeCode Available | 1 |
| Muse: Multi-modal target speaker extraction with visual cues | Oct 15, 2020 | Target Speaker Extraction | CodeCode Available | 1 |
| Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction | Jun 11, 2025 | Speech ExtractionTarget Speaker Extraction | —Unverified | 0 |
| M3ANet: Multi-scale and Multi-Modal Alignment Network for Brain-Assisted Target Speaker Extraction | May 31, 2025 | Contrastive LearningEEG | CodeCode Available | 0 |
| FlowTSE: Target Speaker Extraction with Flow Matching | May 20, 2025 | Target Speaker Extraction | —Unverified | 0 |
| Listen to Extract: Onset-Prompted Target Speaker Extraction | May 8, 2025 | Target Speaker Extraction | —Unverified | 0 |
| C^2AV-TSE: Context and Confidence-aware Audio Visual Target Speaker Extraction | Apr 1, 2025 | Target Speaker Extraction | —Unverified | 0 |
| Target Speaker Extraction through Comparing Noisy Positive and Negative Audio Enrollments | Feb 23, 2025 | Target Speaker Extraction | —Unverified | 0 |
| AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement | Jan 26, 2025 | DenoisingIn-Context Learning | —Unverified | 0 |
| Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection | Jan 7, 2025 | Action DetectionActivity Detection | —Unverified | 0 |
| MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues | Dec 11, 2024 | Target Speaker Extraction | —Unverified | 0 |
| STCON System for the CHiME-8 Challenge | Oct 17, 2024 | Data AugmentationSpeech Separation | —Unverified | 0 |
| Wanna hear your voice? A sample is all we need! | Oct 1, 2024 | AllSpeech Separation | —Unverified | 0 |
| Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions | Sep 29, 2024 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| Generative Speech Foundation Model Pretraining for High-Quality Speech Extraction and Restoration | Sep 24, 2024 | Bandwidth ExtensionDenoising | CodeCode Available | 0 |
| Spectron: Target Speaker Extraction using Conditional Transformer with Adversarial Refinement | Sep 2, 2024 | Target Speaker Extraction | CodeCode Available | 0 |
| Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning | Jul 21, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| SpeakerBeam-SS: Real-time Target Speaker Extraction with Lightweight Conv-TasNet and State Space Modeling | Jul 1, 2024 | Target Speaker Extraction | —Unverified | 0 |
| Binaural Selective Attention Model for Target Speaker Extraction | Jun 18, 2024 | modelTarget Speaker Extraction | —Unverified | 0 |
| Target Speaker Extraction with Curriculum Learning | Jun 12, 2024 | Target Speaker Extraction | —Unverified | 0 |
| Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training | Apr 1, 2024 | Active Speaker DetectionAudio-Visual Active Speaker Detection | —Unverified | 0 |
| Target Speaker Extraction by Directly Exploiting Contextual Information in the Time-Frequency Domain | Feb 27, 2024 | Target Speaker Extraction | —Unverified | 0 |
| Listening to Multi-talker Conversations: Modular and End-to-end Perspectives | Feb 14, 2024 | GPUspeaker-diarization | —Unverified | 0 |
| A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, and Extraction | Oct 12, 2023 | DenoisingSpeech Enhancement | —Unverified | 0 |
| Conditional Diffusion Model for Target Speaker Extraction | Oct 7, 2023 | modelTarget Speaker Extraction | —Unverified | 0 |
| The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction | Sep 15, 2023 | Audio-Visual Speech Recognitionspeech-recognition | —Unverified | 0 |
| SpeechX: Neural Codec Language Model as a Versatile Speech Transformer | Aug 14, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beamformer-Guided Target Speaker Extraction | Mar 15, 2023 | Target Speaker Extraction | —Unverified | 0 |
| Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge | Feb 15, 2023 | Speaker SeparationSpeech Enhancement | —Unverified | 0 |
| Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings | Jan 16, 2023 | Speaker VerificationSpeech Separation | —Unverified | 0 |
| ExARN: self-attending RNN for target speaker extraction | Dec 2, 2022 | Speaker IdentificationTarget Speaker Extraction | —Unverified | 0 |
| Adapting self-supervised models to multi-talker speech recognition using speaker embeddings | Nov 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting | Oct 31, 2022 | Target Speaker Extraction | CodeCode Available | 0 |
| Exploiting spatial information with the informed complex-valued spatial autoencoder for target speaker extraction | Oct 27, 2022 | PositionTarget Speaker Extraction | —Unverified | 0 |
| Semi-supervised Time Domain Target Speaker Extraction with Attention | Jun 18, 2022 | Target Speaker Extraction | —Unverified | 0 |
| Speaker-conditioning Single-channel Target Speaker Extraction using Conformer-based Architectures | May 27, 2022 | Target Speaker Extraction | —Unverified | 0 |