CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training May 23, 2025 Automatic Speech Recognition Emotion Recognition
Code Code Available 11CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 11CosyVoice: A Scalable Multilingual Zero-shot Text-to-speech Synthesizer based on Supervised Semantic Tokens Jul 7, 2024 Language Modelling Large Language Model
Code Code Available 11Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model Jun 10, 2025 Language Modeling Language Modelling
Code Code Available 7Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Aug 29, 2024 Speech Synthesis
Code Code Available 7Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers Jan 5, 2023 In-Context Learning Language Modeling
Code Code Available 7Better speech synthesis through scaling May 12, 2023 Image Generation Speech Synthesis
Code Code Available 6ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech Nov 7, 2022 Representation Learning Speech Representation Learning
Code Code Available 6PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit May 20, 2022 All Automatic Speech Recognition (ASR)
Code Code Available 6StreamSpeech: Simultaneous Speech-to-Speech Translation with Multi-task Learning Jun 5, 2024 Automatic Speech Recognition (ASR) de-en
Code Code Available 5Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) with Pitch Prediction May 31, 2024 Speech Synthesis
Code Code Available 5StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Jun 13, 2023 Speech Synthesis text-to-speech
Code Code Available 5Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 4Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Feb 6, 2025 Speech Synthesis
Code Code Available 4Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Jun 1, 2023 Audio Synthesis Computational Efficiency
Code Code Available 4Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis May 5, 2025 Chatbot Decoder
Code Code Available 3MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 3Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey Dec 9, 2024 Speech Synthesis Survey
Code Code Available 3Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Aug 30, 2024 Audio Compression Audio Generation
Code Code Available 3Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Aug 15, 2024 Speech Synthesis
Code Code Available 3PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation Aug 14, 2024 Speech Synthesis text-to-speech
Code Code Available 3ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control Jun 3, 2024 Speech Synthesis text-to-speech
Code Code Available 3FlashSpeech: Efficient Zero-Shot Speech Synthesis Apr 23, 2024 Rhythm Speech Synthesis
Code Code Available 3NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models Mar 5, 2024 Quantization Speech Synthesis
Code Code Available 3HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis Nov 21, 2023 Speech Synthesis Super-Resolution
Code Code Available 3Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 3ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech Jul 13, 2022 Denoising GPU
Code Code Available 3BigVGAN: A Universal Neural Vocoder with Large-Scale Training Jun 9, 2022 Audio Generation Audio Synthesis
Code Code Available 3Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model May 11, 2022 Packet Loss Concealment Speech Enhancement
Code Code Available 3End-to-end LPCNet: A Neural Vocoder With Fully-Differentiable LPC Estimation Feb 23, 2022 Speech Synthesis
Code Code Available 3Neural Speech Synthesis on a Shoestring: Improving the Efficiency of LPCNet Feb 22, 2022 Speech Synthesis
Code Code Available 3UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation Jun 15, 2021 Speech Synthesis text-to-speech
Code Code Available 3Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Jul 9, 2019 Speech Synthesis text-to-speech
Code Code Available 3RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching Jun 20, 2025 Scheduling Speech Synthesis
Code Code Available 2Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space May 19, 2025 Language Modeling Language Modelling
Code Code Available 2WaveFM: A High-Fidelity and Efficient Vocoder Based on Flow Matching Mar 20, 2025 Speech Synthesis
Code Code Available 2PodAgent: A Comprehensive Framework for Podcast Generation Mar 1, 2025 Audio Generation Speech Synthesis
Code Code Available 2OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis Jan 8, 2025 Decoder Emotional Speech Synthesis
Code Code Available 2Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Oct 1, 2024 Emotional Speech Synthesis Speech Synthesis
Code Code Available 2SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Sep 11, 2024 Decoder Speech Synthesis
Code Code Available 2Sample-Efficient Diffusion for Text-To-Speech Synthesis Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 2SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description Aug 24, 2024 Descriptive Speech Synthesis
Code Code Available 2Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis Jul 13, 2024 Mamba speech-recognition
Code Code Available 2DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability Jun 27, 2024 Speech Synthesis text-to-speech
Code Code Available 2Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis Jun 6, 2024 Decoder Inductive Bias
Code Code Available 2Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness Apr 10, 2024 Speech Synthesis text-to-speech
Code Code Available 2CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models Mar 31, 2024 Denoising Speech Synthesis
Code Code Available 2