| Integrated Speech and Gesture Synthesis | Aug 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| High Fidelity Speech Synthesis with Adversarial Networks | Sep 25, 2019 | Generative Adversarial NetworkSpeech Synthesis | CodeCode Available | 0 | 5 |
| Hierarchical Generative Modeling for Controllable Speech Synthesis | Oct 16, 2018 | AttributeSpeech Synthesis | CodeCode Available | 0 | 5 |
| Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation | Mar 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging | Jul 26, 2021 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Generating Synthetic Speech from SpokenVocab for Speech Translation | Oct 15, 2022 | Data AugmentationMachine Translation | CodeCode Available | 0 | 5 |
| GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram | Apr 8, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition | Aug 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| FPETS : Fully Parallel End-to-End Text-to-Speech System | Dec 12, 2018 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Continuous Speech Tokenizer in Text To Speech | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment | Mar 4, 2020 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes | May 29, 2025 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 0 | 5 |
| Facial Landmark Predictions with Applications to Metaverse | Sep 29, 2022 | Decodertext-to-speech | CodeCode Available | 0 | 5 |
| fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit | Sep 14, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Expediting TTS Synthesis with Adversarial Vocoding | Apr 16, 2019 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit | Oct 24, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020) | May 11, 2020 | Clusteringspeech-recognition | CodeCode Available | 0 | 5 |
| Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis | Feb 28, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging | Jul 12, 2021 | PredictionSpeech Synthesis | CodeCode Available | 0 | 5 |
| Emotional Voice Conversion using Multitask Learning with Text-to-speech | Nov 11, 2019 | Decodertext-to-speech | CodeCode Available | 0 | 5 |
| EmoNews: A Spoken Dialogue System for Expressive News Conversations | Jun 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks | Apr 1, 2019 | Feature Engineeringtext-to-speech | CodeCode Available | 0 | 5 |
| Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation | Jul 7, 2024 | Text to Speech | CodeCode Available | 0 | 5 |
| Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling | Oct 12, 2024 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| AI4D -- African Language Program | Apr 6, 2021 | Machine Translationspeech-recognition | CodeCode Available | 0 | 5 |
| ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis | Mar 20, 2022 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 | 5 |
| Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems | May 21, 2019 | parameter estimationSpeech Synthesis | CodeCode Available | 0 | 5 |
| ClonEval: An Open Voice Cloning Benchmark | Apr 29, 2025 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder | Nov 20, 2020 | Model CompressionQuantization | CodeCode Available | 0 | 5 |
| FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency | Sep 28, 2024 | Text to Speech | CodeCode Available | 0 | 5 |
| Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding | Jul 12, 2024 | regressiontext-to-speech | CodeCode Available | 0 | 5 |
| Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding | Feb 26, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| ArmanTTS single-speaker Persian dataset | Apr 7, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| A Review of Multi-Modal Large Language and Vision Models | Mar 28, 2024 | Image CaptioningPrompt Engineering | —Unverified | 0 | 0 |
| A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer | Jun 6, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| CHULA TTS: A Modularized Text-To-Speech Framework | Dec 1, 2014 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network | May 17, 2019 | DecoderSentence | —Unverified | 0 | 0 |
| A Review of Deep Learning Techniques for Speech Processing | Apr 30, 2023 | Automatic Speech RecognitionDeep Learning | —Unverified | 0 | 0 |
| ChatAnything: Facetime Chat with LLM-Enhanced Personas | Nov 12, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 | 0 |
| A review-based study on different Text-to-Speech technologies | Dec 17, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Generative Model of a Pronunciation Lexicon for Hindi | May 6, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| A Cost Efficient Approach to Correct OCR Errors in Large Document Collections | May 28, 2019 | ClusteringLanguage Modelling | —Unverified | 0 | 0 |
| Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models | Jan 24, 2025 | Emotion ClassificationSpeaker Identification | —Unverified | 0 | 0 |
| Chain-of-Thought Training for Open E2E Spoken Dialogue Systems | May 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface | Apr 1, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Arabic Text-To-Speech (TTS) Data Preparation | Apr 7, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Fully Time-domain Neural Model for Subband-based Speech Synthesizer | Oct 22, 2018 | text-to-speechText to Speech | —Unverified | 0 | 0 |