| Accented Text-to-Speech Synthesis with Limited Data | May 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control | Nov 19, 2021 | ClusteringData Augmentation | —Unverified | 0 | 0 |
| Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios | Jun 7, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| DASB -- Discrete Audio and Speech Benchmark | Jun 20, 2024 | BenchmarkingEmotion Recognition | —Unverified | 0 | 0 |
| IMaSC -- ICFOSS Malayalam Speech Corpus | Nov 23, 2022 | Sentencetext-to-speech | —Unverified | 0 | 0 |
| DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech | Oct 17, 2024 | DisentanglementQuantization | —Unverified | 0 | 0 |
| Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue | Dec 7, 2022 | Spoken Dialogue Systemstext-to-speech | —Unverified | 0 | 0 |
| HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models | Jan 1, 2018 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition | Jun 1, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition | Feb 22, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS | Jun 21, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Cycle-consistency training for end-to-end speech recognition | Nov 2, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English | May 20, 2025 | Automatic Speech Recognitionspeech-recognition | —Unverified | 0 | 0 |
| Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video | Feb 25, 2022 | Face SwappingHuman Detection | —Unverified | 0 | 0 |
| Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language | Jun 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Improve few-shot voice cloning using multi-modal learning | Mar 18, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech | May 19, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models | May 20, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model | Jun 6, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation | Jun 14, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model | Sep 2, 2022 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| An Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech Synthesis of Sanskrit Verse | Sep 15, 2014 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |
| Improving Deliberation by Text-Only and Semi-Supervised Training | Jun 29, 2022 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| HMM-based data augmentation for E2E systems for building conversational speech synthesis systems | Dec 22, 2022 | Data AugmentationLanguage Modeling | —Unverified | 0 | 0 |
| Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models | Nov 12, 2024 | Grapheme-to-Phoneme ConversionRetrieval | —Unverified | 0 | 0 |
| CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR | Nov 7, 2024 | Language ModellingLarge Language Model | —Unverified | 0 | 0 |
| Improving Low Resource Code-switched ASR using Augmented Code-switched TTS | Oct 12, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network | Jan 31, 2020 | QuantizationSpeech Synthesis | —Unverified | 0 | 0 |
| Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information | Aug 31, 2023 | DecoderMulti-Task Learning | —Unverified | 0 | 0 |
| Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows | Jun 10, 2021 | DisentanglementSentence | —Unverified | 0 | 0 |
| Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising | May 20, 2025 | DecoderDenoising | —Unverified | 0 | 0 |
| Improving Performance of End-to-End ASR on Numeric Sequences | Jul 1, 2019 | speech-recognitionSpeech Recognition | —Unverified | 0 | 0 |
| Improving prosodic phrasing of Vietnamese text-to-speech systems | Dec 1, 2020 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis | Nov 6, 2020 | DecoderSentence | —Unverified | 0 | 0 |
| Improving Readability for Automatic Speech Recognition Transcription | Apr 9, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| HLTCOE JHU Submission to the Voice Privacy Challenge 2024 | Sep 13, 2024 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment | Jun 25, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| Improving Speech-to-Speech Translation Through Unlabeled Text | Oct 26, 2022 | Machine Translationspeech-recognition | —Unverified | 0 | 0 |
| Improving the expressiveness of neural vocoding with non-affine Normalizing Flows | Jun 16, 2021 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling | Dec 20, 2022 | Formtext-to-speech | —Unverified | 0 | 0 |
| Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model | Jan 8, 2025 | text-to-speechText to Speech | —Unverified | 0 | 0 |
| Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system | Oct 1, 2021 | Speaker VerificationSpeech Synthesis | —Unverified | 0 | 0 |
| Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis | Dec 22, 2024 | DecoderDisentanglement | —Unverified | 0 | 0 |
| A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI | Mar 23, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 | 0 |
| Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time | Nov 4, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 | 0 |
| High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency | Nov 17, 2021 | CPUDecoder | —Unverified | 0 | 0 |
| High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | Jun 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 | 0 |