Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Jan 5, 2023 In-Context Learning Language Modeling
Code Code Available 7ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 6PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 4Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 3Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 3ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Jul 13, 2022 Denoising GPU
Code Code Available 3Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Sep 11, 2024 Decoder Speech Synthesis
Code Code Available 2Sample-Efficient Diffusion for Text-To-Speech Synthesis Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 2EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Jun 12, 2024 Emotional Speech Synthesis text-to-speech
Code Code Available 2CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Mar 31, 2024 Denoising Speech Synthesis
Code Code Available 2Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Feb 8, 2023 Code Generation Diversity
Code Code Available 2Towards Building Text-To-Speech Systems for the Next Billion Users Nov 17, 2022 Diversity Speech Synthesis
Code Code Available 2StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis May 30, 2022 Data Augmentation Self-Supervised Learning
Code Code Available 2GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech May 15, 2022 Speech Synthesis Style Transfer
Code Code Available 2NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality May 9, 2022 Sentence Speech Synthesis
Code Code Available 2FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis Apr 21, 2022 Denoising GPU
Code Code Available 2iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 2Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Mar 3, 2022 Speech Synthesis text-to-speech
Code Code Available 2PortaSpeech: Portable and High-Quality Generative Text-to-Speech Sep 30, 2021 text-to-speech Text to Speech
Code Code Available 2DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism May 6, 2021 Generative Adversarial Network Singing Voice Synthesis
Code Code Available 2Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram Oct 25, 2019 Generative Adversarial Network GPU
Code Code Available 2FastSpeech: Fast, Robust and Controllable Text to Speech May 22, 2019 Decoder Speech Synthesis
Code Code Available 2FastSpeech: Fast,Robustand Controllable Text-to-Speech May 22, 2019 Decoder text-to-speech
Code Code Available 2Neural Speech Synthesis with Transformer Network Sep 19, 2018 Decoder Machine Translation
Code Code Available 2Efficient Neural Audio Synthesis Feb 23, 2018 Audio Synthesis CPU
Code Code Available 2Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 1UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts Apr 29, 2024 Contrastive Learning Speech Synthesis
Code Code Available 1KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Apr 1, 2024 Speech Synthesis text-to-speech
Code Code Available 1Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Nov 24, 2023 Dimensionality Reduction Emotion Classification
Code Code Available 1Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning Nov 7, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1ArTST: Arabic Text and Speech Transformer Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Aug 31, 2023 Representation Learning Speech Representation Learning
Code Code Available 1Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 1Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 1Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Feb 27, 2023 Speech Synthesis text-to-speech
Code Code Available 1RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis Dec 15, 2022 Relation Speech Synthesis
Code Code Available 1MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Dec 11, 2022 Speech Synthesis text-to-speech
Code Code Available 1OverFlow: Putting flows on top of neural transducers for better TTS Nov 13, 2022 Normalising Flows Speech Synthesis
Code Code Available 1Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder Nov 7, 2022 Speech Synthesis text-to-speech
Code Code Available 1MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Sep 22, 2022 Speech Synthesis text-to-speech
Code Code Available 1Automatic Prosody Annotation with Pre-Trained Text-Speech Model Jun 16, 2022 Speech Synthesis text-to-speech
Code Code Available 1Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1