Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models Sep 18, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech Sep 15, 2023 Knowledge Distillation Speech Synthesis
— Unverified 0Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Sep 14, 2023 Decoder Language Modeling
— Unverified 0Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS Sep 14, 2023 Self-Supervised Learning speech-recognition
— Unverified 0CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram Sep 12, 2023 Denoising Speech Denoising
— Unverified 0Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? Sep 12, 2023 Self-Supervised Learning Speech Synthesis
— Unverified 0Cross-Utterance Conditioned VAE for Speech Generation Sep 8, 2023 Speech Synthesis text-to-speech
— Unverified 0MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023 Sep 6, 2023 Speech Synthesis text-to-speech
— Unverified 0The FruitShell French synthesis system at the Blizzard 2023 Challenge Sep 1, 2023 Data Augmentation Speech Synthesis
— Unverified 0Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis Aug 31, 2023 Expressive Speech Synthesis Sentence
— Unverified 0The DeepZen Speech Synthesis System for Blizzard Challenge 2023 Aug 30, 2023 Sentence Speech Synthesis
— Unverified 0Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations Aug 24, 2023 Representation Learning Speech Synthesis
— Unverified 0TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Aug 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis Aug 16, 2023 Attribute Speech Synthesis
— Unverified 0Accurate synthesis of Dysarthric Speech for ASR data augmentation Aug 16, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN Aug 14, 2023 Speech Synthesis
— Unverified 0EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis Aug 10, 2023 Resynthesis Speech Synthesis
— Unverified 0On Error Propagation of Diffusion Models Aug 9, 2023 Denoising Image Generation
— Unverified 0Adversarial Training of Denoising Diffusion Model Using Dual Discriminators for High-Fidelity Multi-Speaker TTS Aug 3, 2023 Denoising Speech Synthesis
— Unverified 0SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis Aug 2, 2023 Decoder Self-Supervised Learning
— Unverified 0Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech Jul 31, 2023 Acoustic Modelling Speech Synthesis
— Unverified 0Audio-visual video-to-speech synthesis with synthesized input audio Jul 31, 2023 Speech Synthesis
— Unverified 0METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer Jul 29, 2023 Disentanglement Diversity
— Unverified 0Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding Jul 28, 2023 Language Modeling Language Modelling
— Unverified 0An analysis on the effects of speaker embedding choice in non auto-regressive TTS Jul 19, 2023 Representation Learning Speech Synthesis
— Unverified 0SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs Jul 18, 2023 Generative Adversarial Network Language Modeling
— Unverified 0Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Jul 14, 2023 In-Context Learning Language Modelling
— Unverified 0On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis Jul 11, 2023 Prediction Self-Supervised Learning
— Unverified 0RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations Jul 3, 2023 Lip to Speech Synthesis Speaker-Specific Lip to Speech Synthesis
— Unverified 0High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units Jun 29, 2023 Speech Synthesis text-to-speech
— Unverified 0Large-scale unsupervised audio pre-training for video-to-speech synthesis Jun 27, 2023 speech-recognition Speech Recognition
— Unverified 0DSE-TTS: Dual Speaker Embedding for Cross-Lingual Text-to-Speech Jun 25, 2023 Speech Synthesis text-to-speech
— Unverified 0Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale Jun 23, 2023 In-Context Learning Speech Synthesis
Code Code Available 0Strategies in Transfer Learning for Low-Resource Speech Synthesis: Phone Mapping, Features Input, and Source Language Selection Jun 21, 2023 Automatic Speech Recognition speech-recognition
— Unverified 0Visual-Aware Text-to-Speech Jun 21, 2023 Rhythm Speech Synthesis
— Unverified 0Cross-lingual Prosody Transfer for Expressive Machine Dubbing Jun 20, 2023 Expressive Speech Synthesis Speech Synthesis
— Unverified 0CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages Jun 16, 2023 Speech Synthesis text-to-speech
— Unverified 0Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody Jun 16, 2023 Speech Synthesis
— Unverified 0Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis Jun 15, 2023 Denoising Speech Synthesis
— Unverified 0PauseSpeech: Natural Speech Synthesis via Pre-trained Language Model and Pause-based Prosody Modeling Jun 13, 2023 Language Modeling Language Modelling
— Unverified 0HiddenSinger: High-Quality Singing Voice Synthesis via Neural Audio Codec and Latent Diffusion Models Jun 12, 2023 Denoising Singing Voice Synthesis
— Unverified 0Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion Jun 9, 2023 Denoising Speech Synthesis
— Unverified 0VIFS: An End-to-End Variational Inference for Foley Sound Synthesis Jun 8, 2023 Speech Synthesis text-to-speech
Code Code Available 0Take the Hint: Improving Arabic Diacritization with Partially-Diacritized Text Jun 6, 2023 Speech Synthesis
Code Code Available 0PolyVoice: Language Models for Speech to Speech Translation Jun 5, 2023 Language Modeling Language Modelling
— Unverified 0Rhythm-controllable Attention with High Robustness for Long Sentence Speech Synthesis Jun 5, 2023 Rhythm Sentence
— Unverified 0Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously Jun 3, 2023 Speech Synthesis
Code Code Available 0Speaker-independent neural formant synthesis Jun 2, 2023 Speech Synthesis
— Unverified 0Speech inpainting: Context-based speech synthesis guided by video Jun 1, 2023 speech-recognition Speech Recognition
— Unverified 0Text-to-Speech Pipeline for Swiss German -- A comparison May 31, 2023 Speech Synthesis text-to-speech
— Unverified 0