| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 |
| DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability | Jun 27, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 2 |
| NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality | May 9, 2022 | SentenceSpeech Synthesis | CodeCode Available | 2 |
| A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech | Feb 8, 2023 | Code GenerationDiversity | CodeCode Available | 2 |
| NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers | Apr 18, 2023 | In-Context LearningSpeech Synthesis | CodeCode Available | 2 |
| Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment | May 26, 2025 | text-to-speechText to Speech | CodeCode Available | 2 |
| RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching | Jun 20, 2025 | SchedulingSpeech Synthesis | CodeCode Available | 2 |
| Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation | Jun 6, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 |
| Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations | Mar 3, 2023 | Speech DenoisingSpeech Enhancement | CodeCode Available | 1 |
| LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation | Sep 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | Mar 31, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts | Feb 8, 2025 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 1 |
| Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech | Nov 24, 2023 | Dimensionality ReductionEmotion Classification | CodeCode Available | 1 |
| Learning to Dub Movies via Hierarchical Prosody Models | Dec 8, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset | Apr 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech | Mar 31, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis | Apr 1, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features | Aug 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models | May 21, 2025 | Bayesian OptimizationSpeech Synthesis | CodeCode Available | 1 |
| Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation | Aug 3, 2023 | DecoderQuantization | CodeCode Available | 1 |
| Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech | Jul 17, 2024 | Speech-to-Speech Translationtext-to-speech | CodeCode Available | 1 |
| LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search | Feb 8, 2021 | CPUModel Compression | CodeCode Available | 1 |
| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 |
| Mitigating Unauthorized Speech Synthesis for Voice Protection | Oct 28, 2024 | Data AugmentationFace Swapping | CodeCode Available | 1 |
| In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data | Apr 4, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Improving fairness for spoken language understanding in atypical speech with Text-to-Speech | Nov 16, 2023 | Data AugmentationFairness | CodeCode Available | 1 |
| Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning | Nov 7, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation | Jul 30, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 |
| HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks | Apr 6, 2024 | Domain AdaptationSpeech Synthesis | CodeCode Available | 1 |
| HUI-Audio-Corpus-German: A high quality TTS dataset | Jun 11, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation | Oct 23, 2022 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 1 |
| A Character-level Span-based Model for Mandarin Prosodic Structure Prediction | Mar 31, 2022 | Sentencetext-to-speech | CodeCode Available | 1 |
| HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods | Sep 15, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 1 |
| Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech | Feb 27, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus | Jul 29, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech | May 13, 2021 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions | Jun 10, 2025 | text-to-speechText to Speech | CodeCode Available | 1 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset | Apr 7, 2020 | Grapheme-to-Phoneme ConversionPolyphone disambiguation | CodeCode Available | 1 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 |
| Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning | Jun 15, 2022 | AttributeEmotion Classification | CodeCode Available | 1 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |