| HLTCOE JHU Submission to the Voice Privacy Challenge 2024 | Sep 13, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| HMM-based data augmentation for E2E systems for building conversational speech synthesis systems | Dec 22, 2022 | Data AugmentationLanguage Modeling | —Unverified | 0 |
| Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement | Jan 23, 2025 | Data AugmentationSpeech Enhancement | —Unverified | 0 |
| Creating New Voices using Normalizing Flows | Dec 22, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech | Apr 30, 2024 | Decodertext-to-speech | —Unverified | 0 |
| Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS | Jun 21, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder | Dec 6, 2018 | Generative Adversarial Networktext-to-speech | —Unverified | 0 |
| HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models | Jan 1, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform | Dec 13, 2017 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis | Mar 14, 2019 | Generative Adversarial NetworkSpeech Synthesis | —Unverified | 0 |
| Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations | Mar 17, 2024 | Attributetext-to-speech | —Unverified | 0 |
| Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation | Mar 7, 2024 | DiversityMachine Translation | —Unverified | 0 |
| Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English | May 20, 2025 | Automatic Speech Recognitionspeech-recognition | —Unverified | 0 |
| Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| A Melody-Unsupervision Model for Singing Voice Synthesis | Oct 13, 2021 | modelSinging Voice Synthesis | —Unverified | 0 |
| Intelligibility of Text-to-Speech Systems for Mathematical Expressions | Jun 5, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| Improve few-shot voice cloning using multi-modal learning | Mar 18, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech | May 19, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems | Dec 19, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generating Rich Product Descriptions for Conversational E-commerce Systems | Nov 30, 2021 | Sentencetext-to-speech | —Unverified | 0 |
| Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation | Jun 14, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model | Sep 2, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Improving Cross-lingual Speech Synthesis with Triplet Training Scheme | Feb 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Improving Deliberation by Text-Only and Semi-Supervised Training | Jun 29, 2022 | DecoderLanguage Modeling | —Unverified | 0 |
| Learning Speech Representation From Contrastive Token-Acoustic Pretraining | Sep 1, 2023 | Audio ClassificationAutomatic Speech Recognition | —Unverified | 0 |
| Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models | Nov 12, 2024 | Grapheme-to-Phoneme ConversionRetrieval | —Unverified | 0 |
| Generating Narrated Lecture Videos from Slides with Synchronized Highlights | May 5, 2025 | Mathtext-to-speech | —Unverified | 0 |
| Improving Low Resource Code-switched ASR using Augmented Code-switched TTS | Oct 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network | Jan 31, 2020 | QuantizationSpeech Synthesis | —Unverified | 0 |
| Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information | Aug 31, 2023 | DecoderMulti-Task Learning | —Unverified | 0 |
| Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows | Jun 10, 2021 | DisentanglementSentence | —Unverified | 0 |
| Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising | May 20, 2025 | DecoderDenoising | —Unverified | 0 |
| Improving Performance of End-to-End ASR on Numeric Sequences | Jul 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Improving prosodic phrasing of Vietnamese text-to-speech systems | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis | Nov 6, 2020 | DecoderSentence | —Unverified | 0 |
| Improving Readability for Automatic Speech Recognition Transcription | Apr 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data | Apr 10, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment | Jun 25, 2024 | DecoderLanguage Modeling | —Unverified | 0 |
| Improving Speech-to-Speech Translation Through Unlabeled Text | Oct 26, 2022 | Machine Translationspeech-recognition | —Unverified | 0 |
| Improving the expressiveness of neural vocoding with non-affine Normalizing Flows | Jun 16, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling | Dec 20, 2022 | Formtext-to-speech | —Unverified | 0 |
| A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture | Jul 22, 2020 | RhythmSpeech Synthesis | —Unverified | 0 |
| Generating Multilingual Gender-Ambiguous Text-to-Speech Voices | Nov 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis | Dec 22, 2024 | DecoderDisentanglement | —Unverified | 0 |
| Incremental FastPitch: Chunk-based High Quality Text to Speech | Jan 3, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation | Oct 15, 2021 | Data AugmentationSimultaneous Speech-to-Speech Translation | —Unverified | 0 |
| Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior | Feb 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |