| PresentAgent: Multimodal Agent for Presentation Video Generation | Jul 5, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation | Mar 29, 2022 | CPUDecoder | CodeCode Available | 2 | 5 |
| Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment | May 26, 2025 | text-to-speechText to Speech | CodeCode Available | 2 | 5 |
| LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT | Oct 7, 2023 | Audio captioningAutomatic Speech Recognition | CodeCode Available | 2 | 5 |
| NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers | Apr 18, 2023 | In-Context LearningSpeech Synthesis | CodeCode Available | 2 | 5 |
| RWKVTTS: Yet another TTS based on RWKV-7 | Apr 4, 2025 | Computational Efficiencytext-to-speech | CodeCode Available | 2 | 5 |
| SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models | Aug 31, 2023 | DecoderLanguage Modeling | CodeCode Available | 2 | 5 |
| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 | 5 |
| g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset | Apr 7, 2020 | Grapheme-to-Phoneme ConversionPolyphone disambiguation | CodeCode Available | 1 | 5 |
| Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations | Mar 3, 2023 | Speech DenoisingSpeech Enhancement | CodeCode Available | 1 | 5 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 | 5 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 | 5 |
| Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation | Jun 6, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 | 5 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 | 5 |
| FastSpeech 2: Fast and High-Quality End-to-End Text to Speech | Jun 8, 2020 | Knowledge DistillationSpeech Synthesis | CodeCode Available | 1 | 5 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 | 5 |
| ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features | Aug 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis | May 12, 2020 | Speech SynthesisStyle Transfer | CodeCode Available | 1 | 5 |
| Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation | Aug 3, 2023 | DecoderQuantization | CodeCode Available | 1 | 5 |
| Mitigating Unauthorized Speech Synthesis for Voice Protection | Oct 28, 2024 | Data AugmentationFace Swapping | CodeCode Available | 1 | 5 |
| Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech | Oct 1, 2023 | speech-recognitionSpeech Recognition | CodeCode Available | 1 | 5 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 | 5 |
| LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search | Feb 8, 2021 | CPUModel Compression | CodeCode Available | 1 | 5 |
| Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training | Mar 31, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 | 5 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| A Character-level Span-based Model for Mandarin Prosodic Structure Prediction | Mar 31, 2022 | Sentencetext-to-speech | CodeCode Available | 1 | 5 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 | 5 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Learning to Dub Movies via Hierarchical Prosody Models | Dec 8, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts | Feb 8, 2025 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 1 | 5 |
| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 | 5 |
| Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning | Jun 15, 2022 | AttributeEmotion Classification | CodeCode Available | 1 | 5 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 | 5 |
| Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech | Nov 24, 2023 | Dimensionality ReductionEmotion Classification | CodeCode Available | 1 | 5 |
| LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation | Sep 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 | 5 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 | 5 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | Sep 21, 2020 | Sentencetext-to-speech | CodeCode Available | 1 | 5 |