| ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading | Jul 3, 2023 | FormSentence | —Unverified | 0 |
| Contextual Expressive Text-to-Speech | Nov 26, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Continual Learning in Machine Speech Chain Using Gradient Episodic Memory | Nov 27, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Continual Speaker Adaptation for Text-to-Speech Synthesis | Mar 26, 2021 | Continual LearningDiversity | —Unverified | 0 |
| Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations | Nov 11, 2022 | Emotional Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM | Dec 1, 2016 | Expressive Speech SynthesisSpeech Recognition | —Unverified | 0 |
| Continuous Speech Synthesis using per-token Latent Diffusion | Oct 21, 2024 | Image GenerationQuantization | —Unverified | 0 |
| Controllable Accented Text-to-Speech Synthesis | Sep 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controllable Emphasis with zero data for text-to-speech | Jul 13, 2023 | Sentencetext-to-speech | —Unverified | 0 |
| Controllable neural text-to-speech synthesis using intuitive prosodic features | Sep 14, 2020 | SentenceSpeech Synthesis | —Unverified | 0 |
| Controllable speech synthesis by learning discrete phoneme-level prosodic representations | Nov 29, 2022 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Controlling Emotion in Text-to-Speech with Natural Language Prompts | Jun 10, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Controllable Prosody Generation With Partial Inputs | Mar 14, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controlling Prosody in End-to-End TTS: A Case Study on Contrastive Focus Generation | Nov 1, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech | Apr 30, 2020 | Rhythmtext-to-speech | —Unverified | 0 |
| Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU | May 1, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models | Jun 1, 2025 | counterfactualSpeech Synthesis | —Unverified | 0 |
| Learning Speech Representation From Contrastive Token-Acoustic Pretraining | Sep 1, 2023 | Audio ClassificationAutomatic Speech Recognition | —Unverified | 0 |
| Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations | Mar 17, 2024 | Attributetext-to-speech | —Unverified | 0 |
| Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform | Dec 13, 2017 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Creating New Voices using Normalizing Flows | Dec 22, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Cross-Domain Audio Deepfake Detection: Dataset and Analysis | Apr 7, 2024 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech | Sep 15, 2023 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers | Nov 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario | May 21, 2020 | AttributeSpeech Synthesis | —Unverified | 0 |
| Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training | Jan 20, 2022 | Multi-Task LearningSpeech Synthesis | —Unverified | 0 |
| Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation | Oct 31, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | Jun 5, 2023 | Cross-Lingual TransferLanguage Modeling | —Unverified | 0 |
| Cross-speaker Emotion Transfer by Manipulating Speech Style Latents | Mar 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation | Apr 21, 2022 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Cross-speaker style transfer for text-to-speech using data augmentation | Feb 10, 2022 | Data AugmentationStyle Transfer | —Unverified | 0 |
| Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis | Jul 27, 2021 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | Nov 16, 2021 | Diversitytext-to-speech | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Speech Generation | Sep 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech | May 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder | Dec 12, 2024 | Audio SynthesisSinging Voice Synthesis | —Unverified | 0 |
| Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis | Jun 15, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model | Jan 8, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR | Nov 7, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language | Jun 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Cycle-consistency training for end-to-end speech recognition | Nov 2, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition | Feb 22, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech | Oct 17, 2024 | DisentanglementQuantization | —Unverified | 0 |
| DASB -- Discrete Audio and Speech Benchmark | Jun 20, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech | Jan 19, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| Data Efficient Voice Cloning for Neural Singing Synthesis | Feb 19, 2019 | text-to-speechText to Speech | —Unverified | 0 |