| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | Sep 21, 2020 | Sentencetext-to-speech | CodeCode Available | 1 | 5 |
| Multi-Task Learning for Front-End Text Processing in TTS | Jan 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| A Toolbox for Construction and Analysis of Speech Datasets | Apr 11, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP | Sep 11, 2023 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| AdaSpeech: Adaptive Text to Speech for Custom Voice | Mar 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer | Mar 31, 2022 | Text Normalizationtext-to-speech | CodeCode Available | 1 | 5 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 | 5 |
| MultiSpeech: Multi-Speaker Text to Speech with Transformer | Jun 8, 2020 | Decodertext-to-speech | CodeCode Available | 1 | 5 |
| MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline | Sep 22, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 | 5 |
| More than Words: In-the-Wild Visually-Driven Prosody for Text-to-Speech | Nov 19, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 | 5 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 | 5 |
| UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts | Apr 29, 2024 | Contrastive LearningSpeech Synthesis | CodeCode Available | 1 | 5 |
| MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset | Dec 11, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| AudioMarkBench: Benchmarking Robustness of Audio Watermarking | Jun 11, 2024 | Benchmarkingtext-to-speech | CodeCode Available | 1 | 5 |
| AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data | Apr 20, 2021 | Decodertext-to-speech | CodeCode Available | 1 | 5 |
| Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models | May 21, 2025 | Bayesian OptimizationSpeech Synthesis | CodeCode Available | 1 | 5 |
| Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Oct 24, 2017 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 | 5 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 | 5 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 | 5 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder | Nov 7, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 | 5 |
| Automatic Prosody Annotation with Pre-Trained Text-Speech Model | Jun 16, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech | Oct 1, 2023 | speech-recognitionSpeech Recognition | CodeCode Available | 1 | 5 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS | Jun 26, 2024 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 | 5 |
| Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings | Oct 7, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration | May 25, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 | 5 |
| Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech | Nov 7, 2021 | Meta-LearningSpeech Synthesis | CodeCode Available | 1 | 5 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 | 5 |
| Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding | Aug 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text | Apr 3, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Dreamento: an open-source dream engineering toolbox for sleep EEG wearables | Jul 8, 2022 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 | 5 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 | 5 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 | 5 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm | Dec 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Attention model for articulatory features detection | Jul 2, 2019 | Manner Of Articulation Detectionmodel | CodeCode Available | 1 | 5 |
| ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | May 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training | Jul 31, 2023 | DenoisingExpressive Speech Synthesis | CodeCode Available | 1 | 5 |