| Differentiable Reward Optimization for LLM based TTS system | Jul 8, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models | Aug 31, 2023 | DecoderLanguage Modeling | CodeCode Available | 2 | 5 |
| DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability | Jun 27, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis | Sep 11, 2024 | DecoderSpeech Synthesis | CodeCode Available | 2 | 5 |
| FastSpeech: Fast,Robustand Controllable Text-to-Speech | May 22, 2019 | Decodertext-to-speech | CodeCode Available | 2 | 5 |
| FastSpeech: Fast, Robust and Controllable Text to Speech | May 22, 2019 | DecoderSpeech Synthesis | CodeCode Available | 2 | 5 |
| DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors | Jun 17, 2024 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument | Feb 13, 2025 | Audio GenerationDecoder | CodeCode Available | 2 | 5 |
| Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis | Jun 6, 2024 | DecoderInductive Bias | CodeCode Available | 2 | 5 |
| FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec | Sep 14, 2023 | Automatic Speech Recognitionspeech-recognition | CodeCode Available | 2 | 5 |
| TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation | May 28, 2024 | Machine Translationspeech-recognition | CodeCode Available | 2 | 5 |
| TTSDS -- Text-to-Speech Distribution Score | Jul 17, 2024 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| SeamlessM4T: Massively Multilingual & Multimodal Machine Translation | Aug 22, 2023 | Automatic Speech RecognitionMachine Translation | CodeCode Available | 2 | 5 |
| GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech | May 15, 2022 | Speech SynthesisStyle Transfer | CodeCode Available | 2 | 5 |
| DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism | May 6, 2021 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 2 | 5 |
| Sample-Efficient Diffusion for Text-To-Speech Synthesis | Sep 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis | Apr 26, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design | Jul 31, 2023 | Computational Efficiencytext-to-speech | CodeCode Available | 2 | 5 |
| RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching | Jun 20, 2025 | SchedulingSpeech Synthesis | CodeCode Available | 2 | 5 |
| CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations | Apr 10, 2024 | Dialogue Generationtext-to-speech | CodeCode Available | 2 | 5 |
| Recent Advances in Speech Language Models: A Survey | Oct 1, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 | 5 |
| PortaSpeech: Portable and High-Quality Generative Text-to-Speech | Sep 30, 2021 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS | Feb 24, 2023 | Decodertext-to-speech | CodeCode Available | 2 | 5 |
| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| RingFormer: A Neural Vocoder with Ring Attention and Convolution-Augmented Transformer | Jan 2, 2025 | Audio Generationtext-to-speech | CodeCode Available | 2 | 5 |
| Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram | Oct 25, 2019 | Generative Adversarial NetworkGPU | CodeCode Available | 2 | 5 |
| CATT: Character-based Arabic Tashkeel Transformer | Jul 3, 2024 | Arabic Text DiacritizationDecoder | CodeCode Available | 2 | 5 |
| An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation | Feb 26, 2024 | Dataset Generationtext-to-speech | CodeCode Available | 2 | 5 |
| NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality | May 9, 2022 | SentenceSpeech Synthesis | CodeCode Available | 2 | 5 |
| Neural Speech Synthesis with Transformer Network | Sep 19, 2018 | DecoderMachine Translation | CodeCode Available | 2 | 5 |
| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 | 5 |
| RWKVTTS: Yet another TTS based on RWKV-7 | Apr 4, 2025 | Computational Efficiencytext-to-speech | CodeCode Available | 2 | 5 |
| A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech | Feb 8, 2023 | Code GenerationDiversity | CodeCode Available | 2 | 5 |
| Audio Deepfake Detection with Self-Supervised XLS-R and SLS Classifier | Oct 28, 2024 | Audio Deepfake DetectionAudio Generation | CodeCode Available | 2 | 5 |
| Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis | Oct 30, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| PAM: Prompting Audio-Language Models for Audio Quality Assessment | Feb 1, 2024 | Audio Quality AssessmentMusic Generation | CodeCode Available | 2 | 5 |
| CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model | May 11, 2023 | DenoisingGPU | CodeCode Available | 2 | 5 |
| Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness | Apr 10, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| Lightweight and High-Fidelity End-to-End Text-to-Speech with Multi-Band Generation and Inverse Short-Time Fourier Transform | Oct 28, 2022 | CPUKnowledge Distillation | CodeCode Available | 2 | 5 |
| CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models | Mar 31, 2024 | DenoisingSpeech Synthesis | CodeCode Available | 2 | 5 |
| LPCNet: Improving Neural Speech Synthesis Through Linear Prediction | Oct 28, 2018 | PredictionSpeech Synthesis | CodeCode Available | 2 | 5 |
| EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control | Oct 1, 2024 | Emotional Speech SynthesisSpeech Synthesis | CodeCode Available | 2 | 5 |
| iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform | Mar 4, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector | Nov 4, 2024 | DecoderEmotional Speech Synthesis | CodeCode Available | 2 | 5 |
| IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS | Sep 9, 2024 | DenoisingSpeech Enhancement | CodeCode Available | 2 | 5 |
| DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech | Jul 3, 2022 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Scaling Rich Style-Prompted Text-to-Speech Datasets | Mar 6, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Efficient Neural Audio Synthesis | Feb 23, 2018 | Audio SynthesisCPU | CodeCode Available | 2 | 5 |
| FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis | Apr 21, 2022 | DenoisingGPU | CodeCode Available | 2 | 5 |