Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Jan 5, 2023 In-Context Learning Language Modeling
Code Code Available 75 ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 65 PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 65 Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 55 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 55 Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 45 ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 45 MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 35 ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Jul 13, 2022 Denoising GPU
Code Code Available 35 Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 35 Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 25 A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Feb 8, 2023 Code Generation Diversity
Code Code Available 25 iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 25 EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Jun 12, 2024 Emotional Speech Synthesis text-to-speech
Code Code Available 25 NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality May 9, 2022 Sentence Speech Synthesis
Code Code Available 25 Efficient Neural Audio Synthesis Feb 23, 2018 Audio Synthesis CPU
Code Code Available 25 Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 25 FastSpeech: Fast,Robustand Controllable Text-to-Speech May 22, 2019 Decoder text-to-speech
Code Code Available 25 Towards Building Text-To-Speech Systems for the Next Billion Users Nov 17, 2022 Diversity Speech Synthesis
Code Code Available 25 Neural Speech Synthesis with Transformer Network Sep 19, 2018 Decoder Machine Translation
Code Code Available 25 Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Mar 3, 2022 Speech Synthesis text-to-speech
Code Code Available 25 GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech May 15, 2022 Speech Synthesis Style Transfer
Code Code Available 25 FastSpeech: Fast, Robust and Controllable Text to Speech May 22, 2019 Decoder Speech Synthesis
Code Code Available 25 SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Sep 11, 2024 Decoder Speech Synthesis
Code Code Available 25 Sample-Efficient Diffusion for Text-To-Speech Synthesis Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 25 FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 25 Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Oct 25, 2019 Generative Adversarial Network GPU
Code Code Available 25 CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Mar 31, 2024 Denoising Speech Synthesis
Code Code Available 25 PortaSpeech: Portable and High-Quality Generative Text-to-Speech Sep 30, 2021 text-to-speech Text to Speech
Code Code Available 25 LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 25 StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis May 30, 2022 Data Augmentation Self-Supervised Learning
Code Code Available 25 FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Apr 21, 2022 Denoising GPU
Code Code Available 25 DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism May 6, 2021 Generative Adversarial Network Singing Voice Synthesis
Code Code Available 25 KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Apr 1, 2024 Speech Synthesis text-to-speech
Code Code Available 15 Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Nov 24, 2023 Dimensionality Reduction Emotion Classification
Code Code Available 15 KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset Apr 17, 2021 Speech Synthesis text-to-speech
Code Code Available 15 Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 15 Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Nov 7, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 15 Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search May 22, 2020 text-to-speech Text to Speech
Code Code Available 15 Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech May 13, 2021 Decoder Speech Synthesis
Code Code Available 15 Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Feb 27, 2023 Speech Synthesis text-to-speech
Code Code Available 15 In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data Apr 4, 2019 Speech Synthesis text-to-speech
Code Code Available 15 Fine-grained style control in Transformer-based Text-to-speech Synthesis Oct 12, 2021 Inductive Bias Speech Synthesis
Code Code Available 15 FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Jun 8, 2020 Knowledge Distillation Speech Synthesis
Code Code Available 15 Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis May 12, 2020 Speech Synthesis Style Transfer
Code Code Available 15 Exploring Transfer Learning for Low Resource Emotional TTS Jan 14, 2019 Deep Learning Emotional Speech Synthesis
Code Code Available 15 ArTST: Arabic Text and Speech Transformer Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Sep 22, 2022 Speech Synthesis text-to-speech
Code Code Available 15 Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 15