| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 | 5 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers | Apr 18, 2023 | In-Context LearningSpeech Synthesis | CodeCode Available | 2 | 5 |
| DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech | Jul 3, 2022 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment | May 26, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Generative Modeling for Low Dimensional Speech Attributes with Neural Spline Flows | Mar 3, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 2 | 5 |
| RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching | Jun 20, 2025 | SchedulingSpeech Synthesis | CodeCode Available | 2 | 5 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 | 5 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 | 5 |
| Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations | Mar 3, 2023 | Speech DenoisingSpeech Enhancement | CodeCode Available | 1 | 5 |
| Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation | Jun 6, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 | 5 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation | Aug 3, 2023 | DecoderQuantization | CodeCode Available | 1 | 5 |
| LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation | Sep 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech | Oct 1, 2023 | speech-recognitionSpeech Recognition | CodeCode Available | 1 | 5 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | Mar 31, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 | 5 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 | 5 |
| FastSpeech 2: Fast and High-Quality End-to-End Text to Speech | Jun 8, 2020 | Knowledge DistillationSpeech Synthesis | CodeCode Available | 1 | 5 |
| ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features | Aug 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 | 5 |
| Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis | May 12, 2020 | Speech SynthesisStyle Transfer | CodeCode Available | 1 | 5 |
| Mitigating Unauthorized Speech Synthesis for Voice Protection | Oct 28, 2024 | Data AugmentationFace Swapping | CodeCode Available | 1 | 5 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 | 5 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 | 5 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| A Character-level Span-based Model for Mandarin Prosodic Structure Prediction | Mar 31, 2022 | Sentencetext-to-speech | CodeCode Available | 1 | 5 |
| ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts | Feb 8, 2025 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 1 | 5 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 | 5 |
| KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis | Apr 1, 2024 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech | Jul 17, 2024 | Speech-to-Speech Translationtext-to-speech | CodeCode Available | 1 | 5 |
| Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech | Nov 24, 2023 | Dimensionality ReductionEmotion Classification | CodeCode Available | 1 | 5 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 | 5 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 | 5 |
| Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Oct 24, 2017 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning | Jun 15, 2022 | AttributeEmotion Classification | CodeCode Available | 1 | 5 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 | 5 |
| Learning to Dub Movies via Hierarchical Prosody Models | Dec 8, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search | Feb 8, 2021 | CPUModel Compression | CodeCode Available | 1 | 5 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS | Jun 26, 2024 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems | Jun 19, 2025 | BenchmarkingDescriptive | CodeCode Available | 1 | 5 |