CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 11CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 11Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Aug 29, 2024 Speech Synthesis
Code Code Available 7Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Jun 10, 2025 Language Modeling Language Modelling
Code Code Available 7Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Jan 5, 2023 In-Context Learning Language Modeling
Code Code Available 7Better speech synthesis through scaling May 12, 2023 Image Generation Speech Synthesis
Code Code Available 6ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 6PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction May 31, 2024 Speech Synthesis
Code Code Available 5StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Jun 13, 2023 Speech Synthesis text-to-speech
Code Code Available 5Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Feb 6, 2025 Speech Synthesis
Code Code Available 4Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Jun 1, 2023 Audio Synthesis Computational Efficiency
Code Code Available 4ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 4Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 3Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 3Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey Dec 9, 2024 Speech Synthesis Survey
Code Code Available 3NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Mar 5, 2024 Quantization Speech Synthesis
Code Code Available 3BigVGAN: A Universal Neural Vocoder with Large-Scale Training Jun 9, 2022 Audio Generation Audio Synthesis
Code Code Available 3LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 3ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control Jun 3, 2024 Speech Synthesis text-to-speech
Code Code Available 3HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 3Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 3UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Jun 15, 2021 Speech Synthesis text-to-speech
Code Code Available 3Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Aug 30, 2024 Audio Compression Audio Generation
Code Code Available 3Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model May 11, 2022 Packet Loss Concealment Speech Enhancement
Code Code Available 3FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 3Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet Feb 22, 2022 Speech Synthesis
Code Code Available 3PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Aug 14, 2024 Speech Synthesis text-to-speech
Code Code Available 3End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation Feb 23, 2022 Speech Synthesis
Code Code Available 3Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Aug 15, 2024 Speech Synthesis
Code Code Available 3ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Jul 13, 2022 Denoising GPU
Code Code Available 3NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Apr 18, 2023 In-Context Learning Speech Synthesis
Code Code Available 2Conditional Diffusion Probabilistic Model for Speech Enhancement Feb 10, 2022 model Speech Enhancement
Code Code Available 2CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model May 11, 2023 Denoising GPU
Code Code Available 2CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Mar 31, 2024 Denoising Speech Synthesis
Code Code Available 2PodAgent: A Comprehensive Framework for Podcast Generation Mar 1, 2025 Audio Generation Speech Synthesis
Code Code Available 2Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Apr 10, 2024 Speech Synthesis text-to-speech
Code Code Available 2LPCNet: Improving Neural Speech Synthesis Through Linear Prediction Oct 28, 2018 Prediction Speech Synthesis
Code Code Available 2BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Sep 6, 2023 Generative Adversarial Network Speech Synthesis
Code Code Available 2A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Feb 8, 2023 Code Generation Diversity
Code Code Available 2BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Mar 25, 2022 Image Generation Speech Synthesis
Code Code Available 2iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 2LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Oct 12, 2020 CPU GPU
Code Code Available 2HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Sep 18, 2023 Speech Synthesis
Code Code Available 2Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows Mar 3, 2022 Speech Synthesis text-to-speech
Code Code Available 2