| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation | May 10, 2025 | Grapheme-to-Phoneme ConversionLarge Language Model | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BUCEADOR, a multi-language search engine for digital libraries | May 1, 2012 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Building a Luganda Text-to-Speech Model From Crowdsourced Data | May 16, 2024 | Speech Enhancementtext-to-speech | —Unverified | 0 |
| Building a mixed-lingual neural TTS system with only monolingual data | Apr 12, 2019 | Decodertext-to-speech | —Unverified | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 |
| Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech | May 1, 2018 | Automatic Speech Recognition (ASR)Speech Recognition | —Unverified | 0 |
| Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Building Synthetic Speaker Profiles in Text-to-Speech Systems | Feb 7, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Building Text-to-Speech Systems for Resource Poor Languages | May 1, 2012 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Building Text-To-Speech Voices in the Cloud | May 1, 2012 | Speech RecognitionSpeech Synthesis | —Unverified | 0 |
| Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge | Mar 27, 2022 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems | Aug 11, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech | May 1, 2020 | Text Normalizationtext-to-speech | —Unverified | 0 |
| BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus | Jun 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Can DeepFake Speech be Reliably Detected? | Oct 9, 2024 | Face SwappingMisinformation | —Unverified | 0 |
| Can Emotion Fool Anti-spoofing? | May 29, 2025 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data? | Jun 11, 2024 | Contrastive LearningSpeech Synthesis | —Unverified | 0 |
| Can we reconstruct a dysarthric voice with the large speech model Parler TTS? | Jun 4, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data | Mar 2, 2018 | Generative Adversarial NetworkSpeech Enhancement | —Unverified | 0 |
| CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech | Jun 3, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface | Apr 1, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Chain-of-Thought Training for Open E2E Spoken Dialogue Systems | May 31, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Characteristic-Specific Partial Fine-Tuning for Efficient Emotion and Speaker Adaptation in Codec Language Text-to-Speech Models | Jan 24, 2025 | Emotion ClassificationSpeaker Identification | —Unverified | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 |
| ChatAnything: Facetime Chat with LLM-Enhanced Personas | Nov 12, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 |
| CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network | May 17, 2019 | DecoderSentence | —Unverified | 0 |
| CHULA TTS: A Modularized Text-To-Speech Framework | Dec 1, 2014 | text-to-speechText to Speech | —Unverified | 0 |
| CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech | Apr 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus | Feb 28, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding | Feb 26, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning | May 25, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages | Jun 16, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Code-Mixed Text to Speech Synthesis under Low-Resource Constraints | Dec 2, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Code-Switching Text Generation and Injection in Mandarin-English ASR | Mar 20, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling | Jun 17, 2019 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection | Oct 31, 2022 | Audio CompressionFace Swapping | —Unverified | 0 |
| Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis | May 1, 2016 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios | May 20, 2023 | Rhythmtext-to-speech | —Unverified | 0 |
| Compact Neural TTS Voices for Accessibility | Jan 28, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset | Oct 8, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech | Jul 31, 2023 | Acoustic ModellingSpeech Synthesis | —Unverified | 0 |
| Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures | Apr 12, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Comparison of Grapheme-to-Phoneme Conversion Methods on a Myanmar Pronunciation Dictionary | Dec 1, 2016 | Active LearningAutomatic Speech Recognition | —Unverified | 0 |
| Comparison of Speech Representations for the MOS Prediction System | Jun 28, 2022 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| Compress Polyphone Pronunciation Prediction Model with Shared Labels | Oct 1, 2020 | PredictionQuantization | —Unverified | 0 |
| Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need | Jul 2, 2022 | AllSpeech Synthesis | —Unverified | 0 |
| Conditioning Sequence-to-sequence Networks with Learned Activations | Sep 29, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |