ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings May 23, 2023 Chatbot Reading Comprehension
— Unverified 0CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center May 23, 2023 Speech Synthesis
— Unverified 0ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models May 23, 2023 Speech Synthesis text-to-speech
— Unverified 0Text Generation with Speech Synthesis for ASR Data Augmentation May 22, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Scaling Speech Technology to 1,000+ Languages May 22, 2023 Automatic Speech Recognition Language Identification
Code Code Available 1EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels May 22, 2023 Expressive Speech Synthesis Speech Synthesis
Code Code Available 1VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages May 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting May 19, 2023 Speech Synthesis text-to-speech
— Unverified 0Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms May 18, 2023 Speech Synthesis
Code Code Available 0Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data May 18, 2023 Speech Enhancement Speech Synthesis
Code Code Available 1A unified front-end framework for English text-to-speech synthesis May 18, 2023 Speech Synthesis Text Normalization
— Unverified 0Empirical Analysis of Oral and Nasal Vowels of Konkani May 17, 2023 Speech Synthesis
— Unverified 0Better speech synthesis through scaling May 12, 2023 Image Generation Speech Synthesis
Code Code Available 6CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model May 11, 2023 Denoising GPU
Code Code Available 2Zero-shot personalized lip-to-speech synthesis with face image based voice control May 9, 2023 Lip to Speech Synthesis Representation Learning
— Unverified 0Accented Text-to-Speech Synthesis with Limited Data May 8, 2023 Speech Synthesis text-to-speech
— Unverified 0Bts-e: Audio deepfake detection using breathing-talking-silence encoder May 5, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 1M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis May 3, 2023 Speech Synthesis text-to-speech
— Unverified 0A Review of Deep Learning Techniques for Speech Processing Apr 30, 2023 Automatic Speech Recognition Deep Learning
— Unverified 0Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis Apr 26, 2023 Speech Synthesis text-to-speech
Code Code Available 2Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model Apr 24, 2023 Rhythm Self-Supervised Learning
— Unverified 0Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert Apr 18, 2023 Audio Generation Expressive Speech Synthesis
Code Code Available 4NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers Apr 18, 2023 In-Context Learning Speech Synthesis
Code Code Available 2Ensemble prosody prediction for expressive speech synthesis Apr 3, 2023 Diversity Ensemble Learning
— Unverified 0Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis Mar 27, 2023 All Automatic Speech Recognition
— Unverified 0Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis Mar 24, 2023 Generative Adversarial Network Speech Synthesis
— Unverified 0A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI Mar 23, 2023 Speech Enhancement Speech Synthesis
— Unverified 0Transformers in Speech Processing: A Survey Mar 21, 2023 Automatic Speech Recognition Speech Enhancement
— Unverified 0Controllable Prosody Generation With Partial Inputs Mar 14, 2023 Speech Synthesis text-to-speech
— Unverified 0VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation Mar 14, 2023 Disentanglement Speech Synthesis
— Unverified 0QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis Mar 14, 2023 Emotional Speech Synthesis Sentence
— Unverified 0Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis Mar 14, 2023 Prosody Prediction Speech Synthesis
— Unverified 0Do Prosody Transfer Models Transfer Prosody? Mar 7, 2023 Speech Synthesis text-to-speech
— Unverified 0Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling Mar 7, 2023 In-Context Learning Language Modeling
Code Code Available 5FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model Mar 6, 2023 Language Modeling Language Modelling
— Unverified 0Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding Mar 2, 2023 Speech Synthesis text-to-speech
Code Code Available 1ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations Mar 1, 2023 Self-Supervised Learning Speech Synthesis
— Unverified 0On the Audio-visual Synchronization for Lip-to-Speech Synthesis Mar 1, 2023 Audio-Visual Synchronization Lip to Speech Synthesis
— Unverified 0DTW-SiameseNet: Dynamic Time Warped Siamese Network for Mispronunciation Detection and Correction Mar 1, 2023 Dynamic Time Warping Metric Learning
— Unverified 0ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus Feb 28, 2023 Speech Synthesis text-to-speech
— Unverified 0UniFLG: Unified Facial Landmark Generator from Text or Speech Feb 28, 2023 Decoder Face Generation
— Unverified 0CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis Feb 28, 2023 Speech Synthesis text-to-speech
— Unverified 0Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Feb 27, 2023 Speech Synthesis text-to-speech
Code Code Available 1Lip-to-Speech Synthesis in the Wild with Multi-task Learning Feb 17, 2023 Lip to Speech Synthesis Multi-Task Learning
Code Code Available 1Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages Feb 13, 2023 Speech Synthesis text-to-speech
— Unverified 0A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech Feb 8, 2023 Code Generation Diversity
Code Code Available 2Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design Feb 6, 2023 Drug Discovery Learning Theory
— Unverified 0Cross-modal information fusion for voice spoofing detection Feb 1, 2023 Automatic Speech Recognition fake voice detection
Code Code Available 1UzbekTagger: The rule-based POS tagger for Uzbek language Jan 30, 2023 Language Modeling Language Modelling
— Unverified 0Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker Jan 29, 2023 Speech Synthesis text-to-speech
Code Code Available 0