| VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech | Jan 25, 2024 | DecoderHallucination | —Unverified | 0 |
| SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation | Jan 24, 2024 | text-to-speechText to Speech | CodeCode Available | 5 |
| Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization | Jan 23, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Adversarial speech for voice privacy protection from Personalized Speech generation | Jan 22, 2024 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis | Jan 22, 2024 | Speaker VerificationSpeech Synthesis | —Unverified | 0 |
| Benchmarking Large Multimodal Models against Common Corruptions | Jan 22, 2024 | BenchmarkingImage to text | CodeCode Available | 1 |
| Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech | Jan 19, 2024 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |
| DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment | Jan 16, 2024 | DisentanglementSelf-Supervised Learning | CodeCode Available | 2 |
| MCMChaos: Improvising Rap Music with MCMC Methods and Chaos Theory | Jan 15, 2024 | Music Generationtext-to-speech | —Unverified | 0 |
| ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering | Jan 14, 2024 | Audio GenerationLanguage Modeling | —Unverified | 0 |
| Multi-Task Learning for Front-End Text Processing in TTS | Jan 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec2 | Jan 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters | Jan 10, 2024 | Self-Supervised LearningSpeech Enhancement | —Unverified | 0 |
| Transfer the linguistic representations from TTS to accent conversion with non-parallel data | Jan 7, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments | Jan 7, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Incremental FastPitch: Chunk-based High Quality Text to Speech | Jan 3, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction | Jan 3, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Boosting Large Language Model for Speech Synthesis: An Empirical Study | Dec 30, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Normalization of Lithuanian Text Using Regular Expressions | Dec 29, 2023 | Speech SynthesisText Normalization | —Unverified | 0 |
| AE-Flow: AutoEncoder Normalizing Flow | Dec 27, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Creating New Voices using Normalizing Flows | Dec 22, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| External Knowledge Augmented Polyphone Disambiguation Using Large Language Model | Dec 19, 2023 | DecoderLanguage Modeling | —Unverified | 0 |
| A review-based study on different Text-to-Speech technologies | Dec 17, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis | Dec 17, 2023 | Speech SynthesisStyle Transfer | —Unverified | 0 |
| Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism | Dec 11, 2023 | Face GenerationLip Reading | CodeCode Available | 1 |
| An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis | Dec 8, 2023 | BenchmarkingQuantization | —Unverified | 0 |
| Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis | Dec 6, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Rapid Speaker Adaptation in Low Resource Text to Speech Systems using Synthetic Data and Transfer learning | Dec 2, 2023 | Decodertext-to-speech | —Unverified | 0 |
| Code-Mixed Text to Speech Synthesis under Low-Resource Constraints | Dec 2, 2023 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes | Nov 29, 2023 | Face RecognitionFace Swapping | —Unverified | 0 |
| Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech | Nov 24, 2023 | Dimensionality ReductionEmotion Classification | CodeCode Available | 1 |
| Guided Flows for Generative Modeling and Decision Making | Nov 22, 2023 | Conditional Image GenerationDecision Making | —Unverified | 0 |
| HierSpeech++: Bridging the Gap between Semantic and Acoustic Representation of Speech by Hierarchical Variational Inference for Zero-shot Speech Synthesis | Nov 21, 2023 | Speech SynthesisSuper-Resolution | CodeCode Available | 3 |
| Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys | Nov 18, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Utilizing Speech Emotion Recognition and Recommender Systems for Negative Emotion Handling in Therapy Chatbots | Nov 18, 2023 | ChatbotEmotion Recognition | —Unverified | 0 |
| A Study on Altering the Latent Space of Pretrained Text to Speech Models for Improved Expressiveness | Nov 17, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Improving fairness for spoken language understanding in atypical speech with Text-to-Speech | Nov 16, 2023 | Data AugmentationFairness | CodeCode Available | 1 |
| ChatAnything: Facetime Chat with LLM-Enhanced Personas | Nov 12, 2023 | Image GenerationIn-Context Learning | —Unverified | 0 |
| Synthetic Speaking Children -- Why We Need Them and How to Make Them | Nov 8, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment | Nov 7, 2023 | DecoderPosition | —Unverified | 0 |
| Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning | Nov 7, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Transduce and Speak: Neural Transducer for Text-to-Speech with Semantic Token Prediction | Nov 6, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| E3 TTS: Easy End-to-End Diffusion-based Text to Speech | Nov 2, 2023 | text-to-speechText to Speech | —Unverified | 0 |
| Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations | Nov 2, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN | Oct 27, 2023 | DecoderDenoising | —Unverified | 0 |
| Generative Pre-training for Speech with Flow Matching | Oct 25, 2023 | Speech EnhancementSpeech Synthesis | —Unverified | 0 |
| ArTST: Arabic Text and Speech Transformer | Oct 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| DPP-TTS: Diversifying prosodic features of speech via determinantal point processes | Oct 23, 2023 | DiversityPoint Processes | —Unverified | 0 |
| An overview of text-to-speech systems and media applications | Oct 22, 2023 | Acoustic Modellingtext-to-speech | —Unverified | 0 |
| Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling | Oct 14, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 2 |