| Boosting Large Language Model for Speech Synthesis: An Empirical Study | Dec 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Normalization of Lithuanian Text Using Regular Expressions | Dec 29, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| AE-Flow: AutoEncoder Normalizing Flow | Dec 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Creating New Voices using Normalizing Flows | Dec 22, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| External Knowledge Augmented Polyphone Disambiguation Using Large Language Model | Dec 19, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| A review-based study on different Text-to-Speech technologies | Dec 17, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis | Dec 17, 2023 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis | Dec 8, 2023 | BenchmarkingQuantization | —Unverified | 0 |
| Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis | Dec 6, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning | Dec 2, 2023 | Decodertext-to-speech | —Unverified | 0 |
| Code-Mixed Text to Speech Synthesis under Low-Resource Constraints | Dec 2, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes | Nov 29, 2023 | Face RecognitionFace Swapping | —Unverified | 0 |
| Guided Flows for Generative Modeling and Decision Making | Nov 22, 2023 | Conditional Image GenerationDecision Making | —Unverified | 0 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots | Nov 18, 2023 | ChatbotEmotion Recognition | —Unverified | 0 |
| A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness | Nov 17, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| ChatAnything: Facetime Chat with LLM-Enhanced Personas | Nov 12, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 |
| Synthetic Speaking Children -- Why We Need Them and How to Make Them | Nov 8, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 |
| Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction | Nov 6, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| E3 TTS: Easy End-to-End Diffusion-based Text to Speech | Nov 2, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations | Nov 2, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN | Oct 27, 2023 | DecoderDenoising | —Unverified | 0 |
| Generative Pre-training for Speech with Flow Matching | Oct 25, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| DPP-TTS: Diversifying prosodic features of speech via determinantal point processes | Oct 23, 2023 | DiversityPoint Processes | —Unverified | 0 |
| An overview of text-to-speech systems and media applications | Oct 22, 2023 | Acoustic Modellingtext-to-speech | —Unverified | 0 |
| Attentive Multi-Layer Perceptron for Non-autoregressive Generation | Oct 14, 2023 | Machine TranslationSpeech Synthesis | CodeCode Available | 0 |
| On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition | Oct 12, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Prosody Analysis of Audiobooks | Oct 10, 2023 | AttributeLanguage Modeling | CodeCode Available | 0 |
| Neutral TTS Female Voice Corpus in Brazilian Portuguese | Oct 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Unified speech and gesture synthesis using flow matching | Oct 8, 2023 | Audio SynthesisMotion Synthesis | —Unverified | 0 |
| Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset | Oct 8, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis | Oct 5, 2023 | Data AugmentationSpeech Synthesis | —Unverified | 0 |
| The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains | Oct 4, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards human-like spoken dialogue generation between AI agents from written dialogue | Oct 2, 2023 | Dialogue Generationtext-to-speech | —Unverified | 0 |
| Low-Resource Self-Supervised Learning with SSL-Enhanced TTS | Sep 29, 2023 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| Synthetic Speech Detection Based on Temporal Consistency and Distribution of Speaker Features | Sep 29, 2023 | Synthetic Speech Detectiontext-to-speech | —Unverified | 0 |
| High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models | Sep 27, 2023 | AllSpeech Synthesis | —Unverified | 0 |
| Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping | Sep 25, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| VoiceLDM: Text-to-Speech with Environmental Context | Sep 24, 2023 | AudioCapstext-to-speech | —Unverified | 0 |
| DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis | Sep 22, 2023 | DenoisingSpeech Synthesis | —Unverified | 0 |
| The Impact of Silence on Speech Anti-Spoofing | Sep 21, 2023 | Action DetectionActivity Detection | —Unverified | 0 |
| Speak While You Think: Streaming Speech Synthesis During Text Generation | Sep 20, 2023 | Speech SynthesisText Generation | —Unverified | 0 |
| Exploring Speech Enhancement for Low-resource Speech Synthesis | Sep 19, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition | Sep 19, 2023 | Data AugmentationEmotion Recognition | —Unverified | 0 |
| Augmenting text for spoken language understanding with Large Language Models | Sep 17, 2023 | Semantic ParsingSpoken Language Understanding | —Unverified | 0 |
| PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-to-Speech Using Natural Language Descriptions | Sep 15, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech | Sep 15, 2023 | Knowledge DistillationSpeech Synthesis | —Unverified | 0 |
| Direct Text to Speech Translation System using Acoustic Units | Sep 14, 2023 | DecoderSpeech-to-Speech Translation | —Unverified | 0 |
| Cross-Utterance Conditioned VAE for Speech Generation | Sep 8, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |