iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN Aug 14, 2023 Speech Synthesis
— Unverified 0EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis Aug 10, 2023 Resynthesis Speech Synthesis
— Unverified 0On Error Propagation of Diffusion Models Aug 9, 2023 Denoising Image Generation
— Unverified 0Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS Aug 3, 2023 Denoising Speech Synthesis
— Unverified 0Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation Aug 3, 2023 Decoder Quantization
Code Code Available 1SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Aug 2, 2023 Decoder Self-Supervised Learning
— Unverified 0Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Jul 31, 2023 Acoustic Modelling Speech Synthesis
— Unverified 0DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training Jul 31, 2023 Denoising Expressive Speech Synthesis
Code Code Available 1Audio-visual video-to-speech synthesis with synthesized input audio Jul 31, 2023 Speech Synthesis
— Unverified 0METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer Jul 29, 2023 Disentanglement Diversity
— Unverified 0Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Jul 28, 2023 Language Modeling Language Modelling
— Unverified 0SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer Jul 20, 2023 Expressive Speech Synthesis Language Modelling
Code Code Available 1An analysis on the effects of speaker embedding choice in non auto-regressive TTS Jul 19, 2023 Representation Learning Speech Synthesis
— Unverified 0SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs Jul 18, 2023 Generative Adversarial Network Language Modeling
— Unverified 0Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Jul 14, 2023 In-Context Learning Language Modelling
— Unverified 0On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Jul 11, 2023 Prediction Self-Supervised Learning
— Unverified 0Deep Speech Synthesis from MRI-Based Articulatory Representations Jul 5, 2023 Computational Efficiency Denoising
Code Code Available 1Disentanglement in a GAN for Unconditional Speech Synthesis Jul 4, 2023 Disentanglement Generative Adversarial Network
Code Code Available 1RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Jul 3, 2023 Lip to Speech Synthesis Speaker-Specific Lip to Speech Synthesis
— Unverified 0High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units Jun 29, 2023 Speech Synthesis text-to-speech
— Unverified 0EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech Jun 28, 2023 Emotion Recognition Speech Synthesis
Code Code Available 1Large-scale unsupervised audio pre-training for video-to-speech synthesis Jun 27, 2023 speech-recognition Speech Recognition
— Unverified 0DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech Jun 25, 2023 Speech Synthesis text-to-speech
— Unverified 0Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Jun 23, 2023 In-Context Learning Speech Synthesis
Code Code Available 0Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection Jun 21, 2023 Automatic Speech Recognition speech-recognition
— Unverified 0Visual-Aware Text-to-Speech Jun 21, 2023 Rhythm Speech Synthesis
— Unverified 0Cross-lingual Prosody Transfer for Expressive Machine Dubbing Jun 20, 2023 Expressive Speech Synthesis Speech Synthesis
— Unverified 0CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages Jun 16, 2023 Speech Synthesis text-to-speech
— Unverified 0Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody Jun 16, 2023 Speech Synthesis
— Unverified 0Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis Jun 15, 2023 Denoising Speech Synthesis
— Unverified 0StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models Jun 13, 2023 Speech Synthesis text-to-speech
Code Code Available 5PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling Jun 13, 2023 Language Modeling Language Modelling
— Unverified 0HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models Jun 12, 2023 Denoising Singing Voice Synthesis
— Unverified 0Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion Jun 9, 2023 Denoising Speech Synthesis
— Unverified 0VIFS: An End-to-End Variational Inference for Foley Sound Synthesis Jun 8, 2023 Speech Synthesis text-to-speech
Code Code Available 0Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text Jun 6, 2023 Speech Synthesis
Code Code Available 0PolyVoice: Language Models for Speech to Speech Translation Jun 5, 2023 Language Modeling Language Modelling
— Unverified 0Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis Jun 5, 2023 Rhythm Sentence
— Unverified 0Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously Jun 3, 2023 Speech Synthesis
Code Code Available 0Speaker-independent neural formant synthesis Jun 2, 2023 Speech Synthesis
— Unverified 0Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis Jun 1, 2023 Audio Synthesis Computational Efficiency
Code Code Available 4Speech inpainting: Context-based speech synthesis guided by video Jun 1, 2023 speech-recognition Speech Recognition
— Unverified 0Text-to-Speech Pipeline for Swiss German -- A comparison May 31, 2023 Speech Synthesis text-to-speech
— Unverified 0Intelligible Lip-to-Speech Synthesis with Speech Units May 31, 2023 Lip to Speech Synthesis Speech Synthesis
Code Code Available 1Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis May 29, 2023 Speech Synthesis text-to-speech
— Unverified 0ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation May 29, 2023 Speech Synthesis text-to-speech
Code Code Available 1Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models May 27, 2023 Speech Synthesis Voice Conversion
— Unverified 0Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis May 26, 2023 Decoder Speech Synthesis
Code Code Available 1Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 1Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM May 24, 2023 Language Modelling Question Answering
Code Code Available 0