Overview of the Amphion Toolkit (v0.2) Jan 26, 2025 text-to-speech Text to Speech
Code Code Available 95 Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 95 Zero-shot Voice Conversion with Diffusion Transformers Nov 15, 2024 In-Context Learning Voice Conversion
Code Code Available 75 SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Jan 24, 2024 text-to-speech Text to Speech
Code Code Available 55 Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching Oct 8, 2024 Data Augmentation Style Transfer
Code Code Available 45 HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 35 FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 35 DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment Jan 16, 2024 Disentanglement Self-Supervised Learning
Code Code Available 25 iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 25 CoMoSVC: Consistency Model-based Singing Voice Conversion Jan 3, 2024 GPU model
Code Code Available 25 DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion May 25, 2023 Denoising Style Transfer
Code Code Available 25 SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement Jul 10, 2024 Disentanglement Voice Conversion
Code Code Available 25 FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Oct 27, 2022 Data Augmentation text annotation
Code Code Available 25 Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation Nov 8, 2023 Style Transfer Voice Conversion
Code Code Available 25 Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion Aug 11, 2023 Voice Conversion
Code Code Available 25 Unsupervised Speech Decomposition via Triple Information Bottleneck Apr 23, 2020 Rhythm Style Transfer
Code Code Available 25 Low-latency Real-time Voice Conversion on CPU Nov 1, 2023 CPU Knowledge Distillation
Code Code Available 25 SafeEar: Content Privacy-Preserving Audio Deepfake Detection Sep 14, 2024 Audio Deepfake Detection DeepFake Detection
Code Code Available 25 AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss May 14, 2019 Style Transfer Voice Conversion
Code Code Available 25 Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data May 7, 2020 Voice Conversion
Code Code Available 25 Coding Speech through Vocal Tract Kinematics Jun 18, 2024 Voice Conversion
Code Code Available 25 Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier Oct 28, 2024 Audio Deepfake Detection Audio Generation
Code Code Available 25 High-Fidelity Neural Phonetic Posteriorgrams Feb 27, 2024 Voice Conversion
Code Code Available 25 M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus Dec 29, 2022 Music Transcription Singing Voice Synthesis
Code Code Available 25 Voice Conversion With Just Nearest Neighbors May 30, 2023 Voice Conversion
Code Code Available 25 Disentanglement in a GAN for Unconditional Speech Synthesis Jul 4, 2023 Disentanglement Generative Adversarial Network
Code Code Available 15 Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme Sep 28, 2021 Speech Synthesis Voice Conversion
Code Code Available 15 DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model Jun 18, 2023 Data Augmentation Decoder
Code Code Available 15 FSD: An Initial Chinese Dataset for Fake Song Detection Sep 5, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 15 Defending Your Voice: Adversarial Attack on Voice Conversion May 18, 2020 Adversarial Attack Voice Conversion
Code Code Available 15 FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection Oct 18, 2021 Speech Synthesis Synthetic Speech Detection
Code Code Available 15 DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion Sep 9, 2022 De-identification Speaker Verification
Code Code Available 15 Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning Jun 15, 2022 Attribute Emotion Classification
Code Code Available 15 Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution Mar 31, 2022 Voice Conversion
Code Code Available 15 FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention Oct 27, 2020 Disentanglement Speaker Verification
Code Code Available 15 GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models Oct 11, 2022 Disentanglement Generative Adversarial Network
Code Code Available 15 CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion Nov 13, 2023 Contrastive Learning EEG
Code Code Available 15 A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion Nov 3, 2021 Representation Learning Voice Conversion
Code Code Available 15 Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion Sep 5, 2023 Voice Conversion
Code Code Available 15 ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Sep 23, 2022 Pitch control Speech Synthesis
Code Code Available 15 Controllable and Interpretable Singing Voice Decomposition via Assem-VC Oct 25, 2021 Voice Conversion
Code Code Available 15 CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Oct 22, 2020 Voice Conversion
Code Code Available 15 CinC-GAN for Effective F0 prediction for Whisper-to-Normal Speech Conversion Aug 18, 2020 Prediction Voice Conversion
Code Code Available 15 Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion May 13, 2020 Decoder Voice Conversion
Code Code Available 15 AraBERT: Transformer-based Model for Arabic Language Understanding Feb 28, 2020 model named-entity-recognition
Code Code Available 15 crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder Mar 4, 2021 Voice Conversion
Code Code Available 15 Building Bilingual and Code-Switched Voice Conversion with Limited Training Data Using Embedding Consistency Loss Apr 22, 2021 Voice Cloning Voice Conversion
Code Code Available 15 End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions May 19, 2022 Speech Synthesis Style Transfer
Code Code Available 15 Deep Learning Based Assessment of Synthetic Speech Naturalness Apr 23, 2021 Deep Learning Prediction
Code Code Available 15 F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder Apr 15, 2020 Style Transfer Voice Conversion
Code Code Available 15