Overview of the Amphion Toolkit (v0.2) Jan 26, 2025 text-to-speech Text to Speech
Code Code Available 9Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Feb 5, 2025 Self-Supervised Learning Speech Enhancement
Code Code Available 9Zero-shot Voice Conversion with Diffusion Transformers Nov 15, 2024 In-Context Learning Voice Conversion
Code Code Available 7SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation Jan 24, 2024 text-to-speech Text to Speech
Code Code Available 5Improving Data Augmentation-based Cross-Speaker Style Transfer for TTS with Singing Voice, Style Filtering, and F0 Matching Oct 8, 2024 Data Augmentation Style Transfer
Code Code Available 4HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 3FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 3FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion Oct 27, 2022 Data Augmentation text annotation
Code Code Available 2SafeEar: Content Privacy-Preserving Audio Deepfake Detection Sep 14, 2024 Audio Deepfake Detection DeepFake Detection
Code Code Available 2Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier Oct 28, 2024 Audio Deepfake Detection Audio Generation
Code Code Available 2Phoneme Hallucinator: One-shot Voice Conversion via Set Expansion Aug 11, 2023 Voice Conversion
Code Code Available 2iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 2AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss May 14, 2019 Style Transfer Voice Conversion
Code Code Available 2DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion May 25, 2023 Denoising Style Transfer
Code Code Available 2Voice Conversion With Just Nearest Neighbors May 30, 2023 Voice Conversion
Code Code Available 2SaMoye: Zero-shot Singing Voice Conversion Model Based on Feature Disentanglement and Enhancement Jul 10, 2024 Disentanglement Voice Conversion
Code Code Available 2CoMoSVC: Consistency Model-based Singing Voice Conversion Jan 3, 2024 GPU model
Code Code Available 2Low-latency Real-time Voice Conversion on CPU Nov 1, 2023 CPU Knowledge Distillation
Code Code Available 2Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data May 7, 2020 Voice Conversion
Code Code Available 2DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment Jan 16, 2024 Disentanglement Self-Supervised Learning
Code Code Available 2Diff-HierVC: Diffusion-based Hierarchical Voice Conversion with Robust Pitch Generation and Masked Prior for Zero-shot Speaker Adaptation Nov 8, 2023 Style Transfer Voice Conversion
Code Code Available 2Coding Speech through Vocal Tract Kinematics Jun 18, 2024 Voice Conversion
Code Code Available 2High-Fidelity Neural Phonetic Posteriorgrams Feb 27, 2024 Voice Conversion
Code Code Available 2M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus Dec 29, 2022 Music Transcription Singing Voice Synthesis
Code Code Available 2Unsupervised Speech Decomposition via Triple Information Bottleneck Apr 23, 2020 Rhythm Style Transfer
Code Code Available 2FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection Oct 18, 2021 Speech Synthesis Synthetic Speech Detection
Code Code Available 1FastSVC: Fast Cross-Domain Singing Voice Conversion with Feature-wise Linear Modulation Nov 11, 2020 Voice Conversion
Code Code Available 1F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder Apr 15, 2020 Style Transfer Voice Conversion
Code Code Available 1FSD: An Initial Chinese Dataset for Fake Song Detection Sep 5, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 1FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention Oct 27, 2020 Disentanglement Speaker Verification
Code Code Available 1Emotionless: Privacy-Preserving Speech Analysis for Voice Assistants Aug 9, 2019 Emotion Recognition Privacy Preserving
Code Code Available 1Emotional Voice Conversion: Theory, Databases and ESD May 31, 2021 Voice Conversion
Code Code Available 1End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions May 19, 2022 Speech Synthesis Style Transfer
Code Code Available 1Efficient Non-Autoregressive GAN Voice Conversion using VQWav2vec Features and Dynamic Convolution Mar 31, 2022 Voice Conversion
Code Code Available 1Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning Jun 15, 2022 Attribute Emotion Classification
Code Code Available 1Emo-StarGAN: A Semi-Supervised Any-to-Many Non-Parallel Emotion-Preserving Voice Conversion Sep 14, 2023 Voice Conversion
Code Code Available 1Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion Sep 5, 2023 Voice Conversion
Code Code Available 1GAN You Hear Me? Reclaiming Unconditional Speech Synthesis from Diffusion Models Oct 11, 2022 Disentanglement Generative Adversarial Network
Code Code Available 1DeID-VC: Speaker De-identification via Zero-shot Pseudo Voice Conversion Sep 9, 2022 De-identification Speaker Verification
Code Code Available 1A Comparison of Discrete and Soft Speech Units for Improved Voice Conversion Nov 3, 2021 Representation Learning Voice Conversion
Code Code Available 1Deep Learning Based Assessment of Synthetic Speech Naturalness Apr 23, 2021 Deep Learning Prediction
Code Code Available 1CSLP-AE: A Contrastive Split-Latent Permutation Autoencoder Framework for Zero-Shot Electroencephalography Signal Conversion Nov 13, 2023 Contrastive Learning EEG
Code Code Available 1crank: An Open-Source Software for Nonparallel Voice Conversion Based on Vector-Quantized Variational Autoencoder Mar 4, 2021 Voice Conversion
Code Code Available 1CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectrogram Conversion Oct 22, 2020 Voice Conversion
Code Code Available 1AraBERT: Transformer-based Model for Arabic Language Understanding Feb 28, 2020 model named-entity-recognition
Code Code Available 1Defending Your Voice: Adversarial Attack on Voice Conversion May 18, 2020 Adversarial Attack Voice Conversion
Code Code Available 1Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion May 13, 2020 Decoder Voice Conversion
Code Code Available 1ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Sep 23, 2022 Pitch control Speech Synthesis
Code Code Available 1DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model Jun 18, 2023 Data Augmentation Decoder
Code Code Available 1CycleTransGAN-EVC: A CycleGAN-based Emotional Voice Conversion Model with Transformer Nov 30, 2021 Voice Conversion
Code Code Available 1