| D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Sep 11, 2024 | Adversarial AttackAudio Synthesis | —Unverified | 0 |
| Augmentation through Laundering Attacks for Audio Spoof Detection | Oct 1, 2024 | Data AugmentationFace Swapping | —Unverified | 0 |
| Data Redaction from Conditional Generative Models | May 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System | Apr 20, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| AudioVisual Speech Synthesis: A brief literature review | Feb 18, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Data Efficient Voice Cloning for Neural Singing Synthesis | Feb 19, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech | Jan 19, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style | Jul 6, 2021 | DecoderMixture-of-Experts | —Unverified | 0 |
| Accented Text-to-Speech Synthesis with Limited Data | May 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| GraphTTS: graph-to-sequence modelling in neural text-to-speech | Mar 4, 2020 | Graph EmbeddingGraph-to-Sequence | —Unverified | 0 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| DASB -- Discrete Audio and Speech Benchmark | Jun 20, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 |
| DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech | Oct 17, 2024 | DisentanglementQuantization | —Unverified | 0 |
| Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue | Dec 7, 2022 | Spoken Dialogue Systemstext-to-speech | —Unverified | 0 |
| Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition | Feb 22, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Cycle-consistency training for end-to-end speech recognition | Nov 2, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language | Jun 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models | May 20, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| An Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech Synthesis of Sanskrit Verse | Sep 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 |
| CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR | Nov 7, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model | Jan 8, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI | Mar 23, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis | Jun 15, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 |
| CSSinger: End-to-End Chunkwise Streaming Singing Voice Synthesis System Based on Conditional Variational Autoencoder | Dec 12, 2024 | Audio SynthesisSinging Voice Synthesis | —Unverified | 0 |
| An adaptable task-oriented dialog system for stand-alone embedded devices | Jul 1, 2019 | Dialogue ManagementManagement | —Unverified | 0 |
| Audio Deep Fake Detection System with Neural Stitching for ADD 2022 | Apr 19, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Crowdsourcing Latin American Spanish for Low-Resource Text-to-Speech | May 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation | Dec 13, 2024 | Data AugmentationSarcasm Detection | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Speech Generation | Sep 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data | Jun 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Adaptive re-calibration of channel-wise features for Adversarial Audio Classification | Oct 21, 2022 | Audio ClassificationFace Swapping | —Unverified | 0 |
| Accent conversion using discrete units with parallel data synthesized from controllable accented TTS | Sep 30, 2024 | Data AugmentationSpeech Synthesis | —Unverified | 0 |
| Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation | Aug 1, 2024 | Representation LearningSpeech Synthesis | —Unverified | 0 |
| GRASS: Unified Generation Model for Speech-to-Semantic Tasks | Sep 6, 2023 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance | Nov 23, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | Nov 16, 2021 | Diversitytext-to-speech | —Unverified | 0 |
| Audiobook Dialogues as Training Data for Conversational Style Synthetic Voices | Jun 1, 2022 | Sentencetext-to-speech | —Unverified | 0 |
| CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis | Jul 27, 2021 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms | Nov 9, 2018 | GPUImage Captioning | —Unverified | 0 |
| A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge | Jun 22, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-speaker style transfer for text-to-speech using data augmentation | Feb 10, 2022 | Data AugmentationStyle Transfer | —Unverified | 0 |
| Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation | Apr 21, 2022 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Cross-speaker Emotion Transfer by Manipulating Speech Style Latents | Mar 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| A multilingual training strategy for low resource Text to Speech | Sep 2, 2024 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| Adapting TTS models For New Speakers using Transfer Learning | Oct 12, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | Jun 5, 2023 | Cross-Lingual TransferLanguage Modeling | —Unverified | 0 |
| Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training | Jun 3, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |