| Guided Flows for Generative Modeling and Decision Making | Nov 22, 2023 | Conditional Image GenerationDecision Making | —Unverified | 0 |
| Cross-speaker Emotion Transfer by Manipulating Speech Style Latents | Mar 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| A multilingual training strategy for low resource Text to Speech | Sep 2, 2024 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models | Apr 22, 2025 | cross-modal alignmentScript Generation | —Unverified | 0 |
| Cross-Lingual Transfer Learning for Phrase Break Prediction with Multilingual Language Model | Jun 5, 2023 | Cross-Lingual TransferLanguage Modeling | —Unverified | 0 |
| Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training | Jun 3, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-lingual Text-To-Speech with Flow-based Voice Conversion for Improved Pronunciation | Oct 31, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training | Jan 20, 2022 | Multi-Task LearningSpeech Synthesis | —Unverified | 0 |
| AttentionStitch: How Attention Solves the Speech Editing Problem | Mar 5, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Handling Numeric Expressions in Automatic Speech Recognition | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| GOAT-TTS: Expressive and Realistic Speech Generation via A Dual-Branch LLM | Apr 15, 2025 | QuantizationReading Comprehension | —Unverified | 0 |
| Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario | May 21, 2020 | AttributeSpeech Synthesis | —Unverified | 0 |
| Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers | Nov 26, 2019 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation | Oct 27, 2024 | parameter-efficient fine-tuningQuestion Answering | —Unverified | 0 |
| GenerTTS: Pronunciation Disentanglement for Timbre and Style Generalization in Cross-Lingual Text-to-Speech | Jun 27, 2023 | DisentanglementStyle Generalization | —Unverified | 0 |
| Grapheme-Coherent Phonemic and Prosodic Annotation of Speech by Implicit and Explicit Grapheme Conditioning | Jun 5, 2025 | text-to-speechText to Speech | —Unverified | 0 |
| An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems | Oct 21, 2020 | Grapheme-to-Phoneme ConversionRelation | —Unverified | 0 |
| Grapheme-to-Phoneme Transformer Model for Transfer Learning Dialects | Apr 8, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis | Dec 3, 2020 | DecoderGraph Embedding | —Unverified | 0 |
| GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis | Oct 23, 2020 | Graph AttentionGraph Neural Network | —Unverified | 0 |
| GraphTTS: graph-to-sequence modelling in neural text-to-speech | Mar 4, 2020 | Graph EmbeddingGraph-to-Sequence | —Unverified | 0 |
| GRASS: Unified Generation Model for Speech-to-Semantic Tasks | Sep 6, 2023 | named-entity-recognitionNamed Entity Recognition | —Unverified | 0 |
| Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech | Sep 15, 2023 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Generic Indic Text-to-speech Synthesisers with Rapid Adaptation in an End-to-end Framework | Jun 12, 2020 | text-to-speechText to Speech | —Unverified | 0 |
| Cross-Domain Audio Deepfake Detection: Dataset and Analysis | Apr 7, 2024 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance | Nov 23, 2021 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach | Jul 5, 2019 | text-to-speechText to Speech | —Unverified | 0 |
| Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data | Oct 14, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Hand Sign to Bangla Speech: A Deep Learning in Vision based system for Recognizing Hand Sign Digits and Generating Bangla Speech | Jan 17, 2019 | Gesture Recognitiontext-to-speech | —Unverified | 0 |
| Harder or Different? Understanding Generalization of Audio Deepfake Detection | Jun 5, 2024 | Audio Deepfake DetectionDeepFake Detection | —Unverified | 0 |
| Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM | Nov 20, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Hear Your Code Fail, Voice-Assisted Debugging for Python | Jul 20, 2025 | CPUMedical Diagnosis | —Unverified | 0 |
| Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech | Apr 8, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Hierarchical Context-Aware Transformers for Non-Autoregressive Text to Speech | Jun 29, 2021 | DecoderSentence | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | Nov 16, 2021 | Diversitytext-to-speech | —Unverified | 0 |
| Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis | Sep 17, 2020 | Expressive Speech SynthesisSpeech Synthesis | —Unverified | 0 |
| Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control | Nov 19, 2021 | ClusteringData Augmentation | —Unverified | 0 |
| Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis | Nov 12, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Hierarchical Representation of Prosody for Statistical Speech Synthesis | Oct 7, 2015 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Hierarchical Sequence to Sequence Voice Conversion with Limited Data | Jul 15, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Generative Semantic Communication for Text-to-Speech Synthesis | Oct 4, 2024 | QuantizationSemantic Communication | —Unverified | 0 |
| Generative Pre-training for Speech with Flow Matching | Oct 25, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| HiFiTTS-2: A Large-Scale High Bandwidth Speech Dataset | Jun 4, 2025 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Audio Deep Fake Detection System with Neural Stitching for ADD 2022 | Apr 19, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models | Sep 27, 2023 | AllSpeech Synthesis | —Unverified | 0 |
| Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement | Jan 23, 2025 | Data AugmentationSpeech Enhancement | —Unverified | 0 |
| Highly Effective Arabic Diacritization using Sequence to Sequence Modeling | Jun 1, 2019 | Feature EngineeringMachine Translation | —Unverified | 0 |
| High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units | Jun 29, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Creating New Voices using Normalizing Flows | Dec 22, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |