| CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech | Jun 3, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| A Proposal of Automatic Error Correction in Text | Sep 24, 2021 | Information RetrievalLanguage Modelling | —Unverified | 0 | 0 |
| Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data | Mar 2, 2018 | Generative Adversarial NetworkSpeech Enhancement | —Unverified | 0 | 0 |
| Can we reconstruct a dysarthric voice with the large speech model Parler TTS? | Jun 4, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Preliminary Analysis of Automatic Word and Syllable Prominence Detection in Non-Native Speech With Text-to-Speech Prosody Embeddings | Dec 11, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Corpus of Neutral Voice Speech in Brazilian Portuguese | May 21, 2021 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| A Bengali HMM Based Speech Synthesis System | Jun 16, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 | 0 |
| Can Emotion Fool Anti-spoofing? | May 29, 2025 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 | 0 |
| A Practical Guide to Logical Access Voice Presentation Attack Detection | Jan 10, 2022 | Artifact DetectionSpeaker Verification | —Unverified | 0 | 0 |
| Can DeepFake Speech be Reliably Detected? | Oct 9, 2024 | Face SwappingMisinformation | —Unverified | 0 | 0 |
| BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus | Jun 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis | Mar 29, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| A Framework for Synthetic Audio Conversations Generation using Large Language Models | Sep 2, 2024 | Audio ClassificationAudio Tagging | —Unverified | 0 | 0 |
| Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech | May 1, 2020 | Text Normalizationtext-to-speech | —Unverified | 0 | 0 |
| Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems | Aug 11, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge | Mar 27, 2022 | Computational Efficiencytext-to-speech | —Unverified | 0 | 0 |
| Building Text-To-Speech Voices in the Cloud | May 1, 2012 | Speech RecognitionSpeech Synthesis | —Unverified | 0 | 0 |
| Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech | Apr 14, 2022 | Language Acquisitiontext-to-speech | —Unverified | 0 | 0 |
| AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis | Aug 16, 2023 | AttributeSpeech Synthesis | —Unverified | 0 | 0 |
| Building Text-to-Speech Systems for Resource Poor Languages | May 1, 2012 | ClusteringSpeech Synthesis | —Unverified | 0 | 0 |
| Building Synthetic Speaker Profiles in Text-to-Speech Systems | Feb 7, 2022 | Diversitytext-to-speech | —Unverified | 0 | 0 |
| Applying Automated Machine Translation to Educational Video Courses | Jan 9, 2023 | Machine TranslationSpeech Synthesis | —Unverified | 0 | 0 |
| Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech | May 1, 2018 | Automatic Speech Recognition (ASR)Speech Recognition | —Unverified | 0 | 0 |
| Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models | Jun 27, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 | 0 |
| AE-Flow: AutoEncoder Normalizing Flow | Dec 27, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 | 0 |
| Building a mixed-lingual neural TTS system with only monolingual data | Apr 12, 2019 | Decodertext-to-speech | —Unverified | 0 | 0 |
| A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese | Jul 1, 2022 | Polyphone disambiguationtext-to-speech | —Unverified | 0 | 0 |
| Emotional Prosody Control for Speech Generation | Nov 7, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| EmoTalkingGaussian: Continuous Emotion-conditioned Talking Head Synthesis | Feb 2, 2025 | Self-Supervised LearningSSIM | —Unverified | 0 | 0 |
| Building a Luganda Text-to-Speech Model From Crowdsourced Data | May 16, 2024 | Speech Enhancementtext-to-speech | —Unverified | 0 | 0 |
| 基於字元階層之語音合成用文脈訊息擷取(Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese] | Oct 1, 2016 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| 台語古詩朗誦系統A Taiwanese Text-to-Speech System for Ancient Poems[In Chinese] | Oct 1, 2018 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| A Context-Based Numerical Format Prediction for a Text-To-Speech System | Nov 19, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Aug 30, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |
| BUCEADOR, a multi-language search engine for digital libraries | May 1, 2012 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| EmoSpeech: A Corpus of Emotionally Rich and Contextually Detailed Speech Annotations | Dec 9, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| 基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese] | Dec 1, 2016 | Feature EngineeringSpeech Synthesis | —Unverified | 0 | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization | Sep 16, 2024 | Emotional Speech SynthesisIn-Context Learning | —Unverified | 0 | 0 |
| EmoDiff: Intensity Controllable Emotional Text-to-Speech with Soft-Label Guidance | Nov 17, 2022 | Denoisingtext-to-speech | —Unverified | 0 | 0 |
| Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models | Nov 17, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting | Aug 20, 2024 | Keyword Spottingtext-to-speech | —Unverified | 0 | 0 |
| EmoCat: Language-agnostic Emotional Voice Conversion | Jan 14, 2021 | Decodertext-to-speech | —Unverified | 0 | 0 |
| Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation | May 10, 2025 | Grapheme-to-Phoneme ConversionLarge Language Model | —Unverified | 0 | 0 |
| Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions | Sep 25, 2024 | AttributeDimensionality Reduction | —Unverified | 0 | 0 |
| BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Aug 9, 2021 | Talking Head Generationtext-to-speech | —Unverified | 0 | 0 |