Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 9Overview of the Amphion Toolkit (v0.2) Jan 26, 2025 text-to-speech Text to Speech
Code Code Available 9Zero-shot Voice Conversion with Diffusion Transformers Nov 15, 2024 In-Context Learning Voice Conversion
Code Code Available 7SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Jan 24, 2024 text-to-speech Text to Speech
Code Code Available 5Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching Oct 8, 2024 Data Augmentation Style Transfer
Code Code Available 4FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 3HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 3Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier Oct 28, 2024 Audio Deepfake Detection Audio Generation
Code Code Available 2SafeEar: Content Privacy-Preserving Audio Deepfake Detection Sep 14, 2024 Audio Deepfake Detection DeepFake Detection
Code Code Available 2SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement Jul 10, 2024 Disentanglement Voice Conversion
Code Code Available 2Coding Speech through Vocal Tract Kinematics Jun 18, 2024 Voice Conversion
Code Code Available 2High-Fidelity Neural Phonetic Posteriorgrams Feb 27, 2024 Voice Conversion
Code Code Available 2DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment Jan 16, 2024 Disentanglement Self-Supervised Learning
Code Code Available 2CoMoSVC: Consistency Model-based Singing Voice Conversion Jan 3, 2024 GPU model
Code Code Available 2Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation Nov 8, 2023 Style Transfer Voice Conversion
Code Code Available 2Low-latency Real-time Voice Conversion on CPU Nov 1, 2023 CPU Knowledge Distillation
Code Code Available 2Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion Aug 11, 2023 Voice Conversion
Code Code Available 2Voice Conversion With Just Nearest Neighbors May 30, 2023 Voice Conversion
Code Code Available 2DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion May 25, 2023 Denoising Style Transfer
Code Code Available 2M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus Dec 29, 2022 Music Transcription Singing Voice Synthesis
Code Code Available 2FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Oct 27, 2022 Data Augmentation text annotation
Code Code Available 2iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 2Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data May 7, 2020 Voice Conversion
Code Code Available 2Unsupervised Speech Decomposition via Triple Information Bottleneck Apr 23, 2020 Rhythm Style Transfer
Code Code Available 2AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss May 14, 2019 Style Transfer Voice Conversion
Code Code Available 2Training-Free Voice Conversion with Factorized Optimal Transport Jun 11, 2025 Voice Conversion
Code Code Available 1kNN-SVC: Robust Zero-Shot Singing Voice Conversion with Additive Synthesis and Concatenation Smoothness Optimization Apr 8, 2025 Voice Conversion
Code Code Available 1Region-Based Optimization in Continual Learning for Audio Deepfake Detection Dec 16, 2024 Audio Deepfake Detection Continual Learning
Code Code Available 1Where are we in audio deepfake detection? A systematic analysis over generative and detection models Oct 6, 2024 Audio Deepfake Detection Audio Synthesis
Code Code Available 1AutoVisual Fusion Suite: A Comprehensive Evaluation of Image Segmentation and Voice Conversion Tools on HuggingFace Platform Dec 17, 2023 Image Segmentation Segmentation
Code Code Available 1What to Remember: Self-Adaptive Continual Learning for Audio Deepfake Detection Dec 15, 2023 Audio Deepfake Detection Continual Learning
Code Code Available 1Improving fairness for spoken language understanding in atypical speech with Text-to-Speech Nov 16, 2023 Data Augmentation Fairness
Code Code Available 1CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion Nov 13, 2023 Contrastive Learning EEG
Code Code Available 1BiSinger: Bilingual Singing Voice Synthesis Sep 25, 2023 Singing Voice Synthesis text-to-speech
Code Code Available 1HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods Sep 15, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 1StarGAN-VC++: Towards Emotion Preserving Voice Conversion Using Deep Embeddings Sep 14, 2023 Generative Adversarial Network Voice Conversion
Code Code Available 1Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion Sep 14, 2023 Voice Conversion
Code Code Available 1Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion Sep 5, 2023 Voice Conversion
Code Code Available 1FSD: An Initial Chinese Dataset for Fake Song Detection Sep 5, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 1Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques Aug 5, 2023 Quantization Speaker anonymization
Code Code Available 1Rhythm Modeling for Voice Conversion Jul 12, 2023 Rhythm Voice Conversion
Code Code Available 1Disentanglement in a GAN for Unconditional Speech Synthesis Jul 4, 2023 Disentanglement Generative Adversarial Network
Code Code Available 1Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion Jul 1, 2023 speech-recognition Speech Recognition
Code Code Available 1The Singing Voice Conversion Challenge 2023 Jun 26, 2023 Voice Conversion
Code Code Available 1DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model Jun 18, 2023 Data Augmentation Decoder
Code Code Available 1TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion Mar 16, 2023 Decoder Voice Conversion
Code Code Available 1StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models Dec 29, 2022 Data Augmentation text-to-speech
Code Code Available 1Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units Dec 19, 2022 Rhythm Voice Conversion
Code Code Available 1SpeechLMScore: Evaluating speech generation using speech language model Dec 8, 2022 Language Modeling Language Modelling
Code Code Available 1Hiding speaker's sex in speech using zero-evidence speaker representation in an analysis/synthesis pipeline Nov 29, 2022 Voice Conversion
Code Code Available 1