On the Opportunities of Green Computing: A Survey Nov 1, 2023 Fairness Speech Synthesis
— Unverified 0Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning Oct 26, 2023 Contrastive Learning Expressive Speech Synthesis
— Unverified 0Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions Oct 26, 2023 Speech Synthesis
— Unverified 0Generative Pre-training for Speech with Flow Matching Oct 25, 2023 Speech Enhancement Speech Synthesis
— Unverified 0ArTST: Arabic Text and Speech Transformer Oct 25, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 1AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing Oct 24, 2023 Language Modeling Language Modelling
Code Code Available 1Energy-Based Models For Speech Synthesis Oct 19, 2023 Speech Synthesis
— Unverified 0SelfVC: Voice Conversion With Iterative Refinement using Self Transformations Oct 14, 2023 Self-Supervised Learning Speaker Verification
— Unverified 0Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling Oct 14, 2023 Speech Synthesis text-to-speech
Code Code Available 2Attentive Multi-Layer Perceptron for Non-autoregressive Generation Oct 14, 2023 Machine Translation Speech Synthesis
Code Code Available 0Speaking rate attention-based duration prediction for speed control TTS Oct 13, 2023 Attribute Speech Synthesis
— Unverified 0Privacy-oriented manipulation of speaker representations Oct 10, 2023 Speaker Recognition Speech Synthesis
— Unverified 0Neutral TTS Female Voice Corpus in Brazilian Portuguese Oct 8, 2023 Speech Synthesis text-to-speech
— Unverified 0Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting Oct 8, 2023 Prediction Speech Synthesis
Code Code Available 0LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT Oct 7, 2023 Audio captioning Automatic Speech Recognition
Code Code Available 2Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis Oct 5, 2023 Data Augmentation Speech Synthesis
— Unverified 0The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains Oct 4, 2023 Speech Synthesis text-to-speech
— Unverified 0Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech Oct 1, 2023 speech-recognition Speech Recognition
Code Code Available 1High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models Sep 27, 2023 All Speech Synthesis
— Unverified 0Collaborative Watermarking for Adversarial Speech Synthesis Sep 26, 2023 Speaker Verification Speech Synthesis
— Unverified 0Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping Sep 25, 2023 Speech Synthesis text-to-speech
— Unverified 0P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting Sep 22, 2023 Decoder Speech Synthesis
Code Code Available 2DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis Sep 22, 2023 Denoising Speech Synthesis
— Unverified 0A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis Sep 21, 2023 Emotion Recognition Speech Synthesis
— Unverified 0Speak While You Think: Streaming Speech Synthesis During Text Generation Sep 20, 2023 Speech Synthesis Text Generation
— Unverified 0Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model Sep 20, 2023 Chatbot Language Modeling
Code Code Available 1Exploring Speech Enhancement for Low-resource Speech Synthesis Sep 19, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Leveraging Speech PTM, Text LLM, and Emotional TTS for Speech Emotion Recognition Sep 19, 2023 Data Augmentation Emotion Recognition
— Unverified 0Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models Sep 18, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Speech Synthesis By Unrolling Diffusion Process using Neural Network Layers Sep 18, 2023 Denoising Speech Synthesis
— Unverified 0HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform Sep 18, 2023 Speech Synthesis
Code Code Available 2Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech Sep 15, 2023 Knowledge Distillation Speech Synthesis
— Unverified 0Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS Sep 14, 2023 Self-Supervised Learning speech-recognition
— Unverified 0Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks Sep 14, 2023 Decoder Language Modeling
— Unverified 0FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec Sep 14, 2023 Automatic Speech Recognition speech-recognition
Code Code Available 2CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram Sep 12, 2023 Denoising Speech Denoising
— Unverified 0Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end? Sep 12, 2023 Self-Supervised Learning Speech Synthesis
— Unverified 0Cross-Utterance Conditioned VAE for Speech Generation Sep 8, 2023 Speech Synthesis text-to-speech
— Unverified 0MuLanTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2023 Sep 6, 2023 Speech Synthesis text-to-speech
— Unverified 0BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network Sep 6, 2023 Generative Adversarial Network Speech Synthesis
Code Code Available 2Matcha-TTS: A fast TTS architecture with conditional flow matching Sep 6, 2023 Acoustic Modelling Decoder
Code Code Available 3The FruitShell French synthesis system at the Blizzard 2023 Challenge Sep 1, 2023 Data Augmentation Speech Synthesis
— Unverified 0QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Aug 31, 2023 Representation Learning Speech Representation Learning
Code Code Available 1Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis Aug 31, 2023 Expressive Speech Synthesis Sentence
— Unverified 0The DeepZen Speech Synthesis System for Blizzard Challenge 2023 Aug 30, 2023 Sentence Speech Synthesis
— Unverified 0Generalizable Zero-Shot Speaker Adaptive Speech Synthesis with Disentangled Representations Aug 24, 2023 Representation Learning Speech Synthesis
— Unverified 0TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition Aug 21, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis Aug 16, 2023 Attribute Speech Synthesis
— Unverified 0Accurate synthesis of Dysarthric Speech for ASR data augmentation Aug 16, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding Aug 15, 2023 Speech Synthesis
Code Code Available 1