RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction Mar 8, 2024 Audio Generation Computational Efficiency
Code Code Available 2EVA-GAN: Enhanced Various Audio Generation via Scalable Generative Adversarial Networks Jan 31, 2024 Audio Generation Speech Synthesis
Code Code Available 2Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting Sep 22, 2023 Decoder Speech Synthesis
Code Code Available 2HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Sep 18, 2023 Speech Synthesis
Code Code Available 2FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Sep 6, 2023 Generative Adversarial Network Speech Synthesis
Code Code Available 2CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model May 11, 2023 Denoising GPU
Code Code Available 2Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis Apr 26, 2023 Speech Synthesis text-to-speech
Code Code Available 2NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Apr 18, 2023 In-Context Learning Speech Synthesis
Code Code Available 2A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Feb 8, 2023 Code Generation Diversity
Code Code Available 2Towards Building Text-To-Speech Systems for the Next Billion Users Nov 17, 2022 Diversity Speech Synthesis
Code Code Available 2StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis May 30, 2022 Data Augmentation Self-Supervised Learning
Code Code Available 2GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech May 15, 2022 Speech Synthesis Style Transfer
Code Code Available 2NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality May 9, 2022 Sentence Speech Synthesis
Code Code Available 2FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Apr 21, 2022 Denoising GPU
Code Code Available 2BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Mar 25, 2022 Image Generation Speech Synthesis
Code Code Available 2iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 2Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Mar 3, 2022 Speech Synthesis text-to-speech
Code Code Available 2Conditional Diffusion Probabilistic Model for Speech Enhancement Feb 10, 2022 model Speech Enhancement
Code Code Available 2HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Oct 12, 2020 CPU GPU
Code Code Available 2Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Aug 10, 2020 Decoder Speech Synthesis
Code Code Available 2Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Oct 25, 2019 Generative Adversarial Network GPU
Code Code Available 2Using Speech Synthesis to Train End-to-End Spoken Language Understanding Models Oct 21, 2019 Data Augmentation Natural Language Understanding
Code Code Available 2FastSpeech: Fast, Robust and Controllable Text to Speech May 22, 2019 Decoder Speech Synthesis
Code Code Available 2A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet Mar 28, 2019 Speech Synthesis
Code Code Available 2LPCNet: Improving Neural Speech Synthesis Through Linear Prediction Oct 28, 2018 Prediction Speech Synthesis
Code Code Available 2Neural Speech Synthesis with Transformer Network Sep 19, 2018 Decoder Machine Translation
Code Code Available 2Efficient Neural Audio Synthesis Feb 23, 2018 Audio Synthesis CPU
Code Code Available 2InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems Jun 19, 2025 Benchmarking Descriptive
Code Code Available 1Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 1SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis Apr 14, 2025 Face Swapping Speech Synthesis
Code Code Available 1Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet Feb 4, 2025 Speech Synthesis text-to-speech
Code Code Available 1Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis Jan 11, 2025 Attribute Benchmarking
Code Code Available 1AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder Jan 9, 2025 Pitch Classification Pitch control
Code Code Available 1Region-Based Optimization in Continual Learning for Audio Deepfake Detection Dec 16, 2024 Audio Deepfake Detection Continual Learning
Code Code Available 1SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Nov 15, 2024 Image Generation Speech Synthesis
Code Code Available 1Mitigating Unauthorized Speech Synthesis for Voice Protection Oct 28, 2024 Data Augmentation Face Swapping
Code Code Available 1STTATTS: Unified Speech-To-Text And Text-To-Speech Model Oct 24, 2024 Multi-Task Learning speech-recognition
Code Code Available 1PRESENT: Zero-Shot Text-to-Prosody Control Aug 13, 2024 Prosody Prediction Speech Synthesis
Code Code Available 1Generative Expressive Conversational Speech Synthesis Jul 31, 2024 Speech Synthesis
Code Code Available 1VoxSim: A perceptual voice similarity dataset Jul 26, 2024 Benchmarking Speaker Recognition
Code Code Available 1dMel: Speech Tokenization made Simple Jul 22, 2024 Decoder Language Modeling
Code Code Available 1Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings Jul 19, 2024 Expressive Speech Synthesis Speech Synthesis
Code Code Available 1Fine-Grained and Interpretable Neural Speech Editing Jul 7, 2024 Data Augmentation Speech Synthesis
Code Code Available 1Articulatory Phonetics Informed Controllable Expressive Speech Synthesis Jun 15, 2024 Expressive Speech Synthesis Speech Synthesis
Code Code Available 1UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts Apr 29, 2024 Contrastive Learning Speech Synthesis
Code Code Available 1HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks Apr 6, 2024 Domain Adaptation Speech Synthesis
Code Code Available 1KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Apr 1, 2024 Speech Synthesis text-to-speech
Code Code Available 1