CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 115 CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 115 CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 115 Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Aug 29, 2024 Speech Synthesis
Code Code Available 75 Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Jun 10, 2025 Language Modeling Language Modelling
Code Code Available 75 Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Jan 5, 2023 In-Context Learning Language Modeling
Code Code Available 75 Better speech synthesis through scaling May 12, 2023 Image Generation Speech Synthesis
Code Code Available 65 ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 65 PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 65 StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Jun 13, 2023 Speech Synthesis text-to-speech
Code Code Available 55 Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 55 Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction May 31, 2024 Speech Synthesis
Code Code Available 55 StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 55 Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Feb 6, 2025 Speech Synthesis
Code Code Available 45 ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 45 Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 45 Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Jun 1, 2023 Audio Synthesis Computational Efficiency
Code Code Available 45 MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 35 Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 35 BigVGAN: A Universal Neural Vocoder with Large-Scale Training Jun 9, 2022 Audio Generation Audio Synthesis
Code Code Available 35 UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Jun 15, 2021 Speech Synthesis text-to-speech
Code Code Available 35 NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Mar 5, 2024 Quantization Speech Synthesis
Code Code Available 35 HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 35 Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey Dec 9, 2024 Speech Synthesis Survey
Code Code Available 35 FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 35 LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 35 Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Aug 30, 2024 Audio Compression Audio Generation
Code Code Available 35 End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation Feb 23, 2022 Speech Synthesis
Code Code Available 35 ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Jul 13, 2022 Denoising GPU
Code Code Available 35 Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model May 11, 2022 Packet Loss Concealment Speech Enhancement
Code Code Available 35 Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet Feb 22, 2022 Speech Synthesis
Code Code Available 35 ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control Jun 3, 2024 Speech Synthesis text-to-speech
Code Code Available 35 Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 35 Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Aug 15, 2024 Speech Synthesis
Code Code Available 35 PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Aug 14, 2024 Speech Synthesis text-to-speech
Code Code Available 35 LPCNet: Improving Neural Speech Synthesis Through Linear Prediction Oct 28, 2018 Prediction Speech Synthesis
Code Code Available 25 NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Apr 18, 2023 In-Context Learning Speech Synthesis
Code Code Available 25 BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Sep 6, 2023 Generative Adversarial Network Speech Synthesis
Code Code Available 25 BDDM: Bilateral Denoising Diffusion Models for Fast and High-Quality Speech Synthesis Mar 25, 2022 Image Generation Speech Synthesis
Code Code Available 25 PodAgent: A Comprehensive Framework for Podcast Generation Mar 1, 2025 Audio Generation Speech Synthesis
Code Code Available 25 Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 25 Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Apr 10, 2024 Speech Synthesis text-to-speech
Code Code Available 25 HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Sep 18, 2023 Speech Synthesis
Code Code Available 25 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model May 11, 2023 Denoising GPU
Code Code Available 25 Improving Opus Low Bit Rate Quality with Neural Speech Synthesis Aug 10, 2020 Decoder Speech Synthesis
Code Code Available 25 Conditional Diffusion Probabilistic Model for Speech Enhancement Feb 10, 2022 model Speech Enhancement
Code Code Available 25 HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Oct 12, 2020 CPU GPU
Code Code Available 25 iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform Mar 4, 2022 Speech Synthesis text-to-speech
Code Code Available 25 Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 25 FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 25