| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech | Jul 19, 2018 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 |
| E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS | Jun 26, 2024 | text-to-speechText to Speech | CodeCode Available | 1 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Oct 24, 2017 | text-to-speechText to Speech | CodeCode Available | 1 |
| DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training | Jul 31, 2023 | DenoisingExpressive Speech Synthesis | CodeCode Available | 1 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 |
| Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech | Jun 5, 2022 | Polyphone disambiguationtext-to-speech | CodeCode Available | 1 |
| ArTST: Arabic Text and Speech Transformer | Oct 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet | Feb 4, 2025 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Deep Learning Based Assessment of Synthetic Speech Naturalness | Apr 23, 2021 | Deep LearningPrediction | CodeCode Available | 1 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| Dreamento: an open-source dream engineering toolbox for sleep EEG wearables | Jul 8, 2022 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning | Aug 31, 2023 | Representation LearningSpeech Representation Learning | CodeCode Available | 1 |
| Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding | Feb 26, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| ArmanTTS single-speaker Persian dataset | Apr 7, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Review of Multi-Modal Large Language and Vision Models | Mar 28, 2024 | Image CaptioningPrompt Engineering | —Unverified | 0 |
| A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer | Jun 6, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| CHULA TTS: A Modularized Text-To-Speech Framework | Dec 1, 2014 | text-to-speechText to Speech | —Unverified | 0 |
| CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network | May 17, 2019 | DecoderSentence | —Unverified | 0 |
| A Review of Deep Learning Techniques for Speech Processing | Apr 30, 2023 | Automatic Speech RecognitionDeep Learning | —Unverified | 0 |
| ChatAnything: Facetime Chat with LLM-Enhanced Personas | Nov 12, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 |
| A review-based study on different Text-to-Speech technologies | Dec 17, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| A Generative Model of a Pronunciation Lexicon for Hindi | May 6, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Cost Efficient Approach to Correct OCR Errors in Large Document Collections | May 28, 2019 | ClusteringLanguage Modelling | —Unverified | 0 |
| Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models | Jan 24, 2025 | Emotion ClassificationSpeaker Identification | —Unverified | 0 |
| Chain-of-Thought Training for Open E2E Spoken Dialogue Systems | May 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Fully Time-domain Neural Model for Subband-based Speech Synthesizer | Oct 22, 2018 | text-to-speechText to Speech | —Unverified | 0 |
| CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface | Apr 1, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Arabic Text-To-Speech (TTS) Data Preparation | Apr 7, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| A Bengali HMM Based Speech Synthesis System | Jun 16, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech | Jun 3, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 |
| Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data | Mar 2, 2018 | Generative Adversarial NetworkSpeech Enhancement | —Unverified | 0 |
| Can we reconstruct a dysarthric voice with the large speech model Parler TTS? | Jun 4, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings | Dec 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| A Corpus of Neutral Voice Speech in Brazilian Portuguese | May 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain | Jun 3, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder | Dec 12, 2024 | Audio SynthesisSinging Voice Synthesis | —Unverified | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |