| Can Emotion Fool Anti-spoofing? | May 29, 2025 | Emotion RecognitionSpeech Emotion Recognition | —Unverified | 0 |
| A Framework for Synthetic Audio Conversations Generation using Large Language Models | Sep 2, 2024 | Audio ClassificationAudio Tagging | —Unverified | 0 |
| Can DeepFake Speech be Reliably Detected? | Oct 9, 2024 | Face SwappingMisinformation | —Unverified | 0 |
| BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus | Jun 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis | Mar 29, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Aug 30, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Burmese Speech Corpus, Finite-State Text Normalization and Pronunciation Grammars with an Application to Text-to-Speech | May 1, 2020 | Text Normalizationtext-to-speech | —Unverified | 0 |
| Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems | Aug 11, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis | Aug 16, 2023 | AttributeSpeech Synthesis | —Unverified | 0 |
| Bunched LPCNet2: Efficient Neural Vocoders Covering Devices from Cloud to Edge | Mar 27, 2022 | Computational Efficiencytext-to-speech | —Unverified | 0 |
| Building Text-To-Speech Voices in the Cloud | May 1, 2012 | Speech RecognitionSpeech Synthesis | —Unverified | 0 |
| Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech | Apr 14, 2022 | Language Acquisitiontext-to-speech | —Unverified | 0 |
| A Context-Based Numerical Format Prediction for a Text-To-Speech System | Nov 19, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Building Text-to-Speech Systems for Resource Poor Languages | May 1, 2012 | ClusteringSpeech Synthesis | —Unverified | 0 |
| Building Synthetic Speaker Profiles in Text-to-Speech Systems | Feb 7, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Applying Automated Machine Translation to Educational Video Courses | Jan 9, 2023 | Machine TranslationSpeech Synthesis | —Unverified | 0 |
| Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Building Open Javanese and Sundanese Corpora for Multilingual Text-to-Speech | May 1, 2018 | Automatic Speech Recognition (ASR)Speech Recognition | —Unverified | 0 |
| Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models | Jun 27, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| AE-Flow: AutoEncoder Normalizing Flow | Dec 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Building a synchronous corpus of acoustic and 3D facial marker data for adaptive audio-visual speech synthesis | May 1, 2012 | Audio-Visual Speech RecognitionSpeech Recognition | —Unverified | 0 |
| Building a mixed-lingual neural TTS system with only monolingual data | Apr 12, 2019 | Decodertext-to-speech | —Unverified | 0 |
| A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese | Jul 1, 2022 | Polyphone disambiguationtext-to-speech | —Unverified | 0 |
| Building a Luganda Text-to-Speech Model From Crowdsourced Data | May 16, 2024 | Speech Enhancementtext-to-speech | —Unverified | 0 |
| 基於字元階層之語音合成用文脈訊息擷取(Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese] | Oct 1, 2016 | text-to-speechText to Speech | —Unverified | 0 |
| 台語古詩朗誦系統A Taiwanese Text-to-Speech System for Ancient Poems[In Chinese] | Oct 1, 2018 | text-to-speechText to Speech | —Unverified | 0 |
| DiffVoice: Text-to-Speech with Latent Diffusion | Apr 23, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Direct Speech to Speech Translation: A Review | Mar 3, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BUCEADOR, a multi-language search engine for digital libraries | May 1, 2012 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| 基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese] | Dec 1, 2016 | Feature EngineeringSpeech Synthesis | —Unverified | 0 |
| BTS: Back TranScription for Speech-to-Text Post-Processor using Text-to-Speech-to-Text | Aug 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models | Nov 17, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting | Aug 20, 2024 | Keyword Spottingtext-to-speech | —Unverified | 0 |
| Bridging the Gap: An Intermediate Language for Enhanced and Cost-Effective Grapheme-to-Phoneme Conversion with Homographs with Multiple Pronunciations Disambiguation | May 10, 2025 | Grapheme-to-Phoneme ConversionLarge Language Model | —Unverified | 0 |
| BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights | Jan 29, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person | Aug 9, 2021 | Talking Head Generationtext-to-speech | —Unverified | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Bootstrapping non-parallel voice conversion from speaker-adaptive text-to-speech | Sep 14, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck | Apr 4, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Adversarial speech for voice privacy protection from Personalized Speech generation | Jan 22, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| A Comparative Analysis of Pretrained Language Models for Text-to-Speech | Sep 4, 2023 | Natural Language UnderstandingPrediction | —Unverified | 0 |
| Bootstrap an end-to-end ASR system by multilingual training, transfer learning, text-to-text mapping and synthetic audio | Nov 25, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Boosting Large Language Model for Speech Synthesis: An Empirical Study | Dec 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| An overview of text-to-speech systems and media applications | Oct 22, 2023 | Acoustic Modellingtext-to-speech | —Unverified | 0 |
| Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study | Jun 7, 2024 | DiversityLanguage Modeling | —Unverified | 0 |
| BOFFIN TTS: Few-Shot Speaker Adaptation by Bayesian Optimization | Feb 4, 2020 | Bayesian Optimizationtext-to-speech | —Unverified | 0 |
| An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era | Oct 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech | Oct 12, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation | Jun 4, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| BitTTS: Highly Compact Text-to-Speech Using 1.58-bit Quantization and Weight Indexing | Jun 4, 2025 | Quantizationtext-to-speech | —Unverified | 0 |