DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech Oct 17, 2024 Disentanglement Quantization
— Unverified 0Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR Oct 16, 2024 Denoising Speech Synthesis
— Unverified 0DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Oct 14, 2024 Denoising Speaker Verification
— Unverified 0Everyday Speech in the Indian Subcontinent Oct 14, 2024 Speech Synthesis
— Unverified 0Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch Oct 9, 2024 Speech Synthesis text-to-speech
— Unverified 0Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Oct 9, 2024 Diversity Speech Synthesis
— Unverified 0HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis Oct 6, 2024 Language Modeling Language Modelling
— Unverified 0Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System Oct 5, 2024 Adversarial Purification Speech Synthesis
— Unverified 0MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech Oct 4, 2024 Disentanglement Speech Synthesis
— Unverified 0Generative Semantic Communication for Text-to-Speech Synthesis Oct 4, 2024 Quantization Semantic Communication
— Unverified 0Accent conversion using discrete units with parallel data synthesized from controllable accented TTS Sep 30, 2024 Data Augmentation Speech Synthesis
— Unverified 0Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective Sep 29, 2024 Audio-Visual Speech Recognition Lip Reading
— Unverified 0EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis Sep 27, 2024 Speech Synthesis
— Unverified 0Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech Sep 24, 2024 Emotional Speech Synthesis Speech Synthesis
— Unverified 0StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Sep 24, 2024 Speech Synthesis text-to-speech
— Unverified 0Audio Codec Augmentation for Robust Collaborative Watermarking of Speech Synthesis Sep 20, 2024 Face Swapping Speech Synthesis
Code Code Available 0NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization Sep 19, 2024 Audio Compression Audio Generation
— Unverified 0Single-stage TTS with Masked Audio Token Modeling and Semantic Knowledge Distillation Sep 17, 2024 Knowledge Distillation Speech Synthesis
— Unverified 0Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data Sep 17, 2024 Speech Synthesis
— Unverified 0Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization Sep 16, 2024 Emotional Speech Synthesis In-Context Learning
— Unverified 0StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Sep 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation Sep 14, 2024 Speech Synthesis text-to-speech
— Unverified 0Text-To-Speech Synthesis In The Wild Sep 13, 2024 Benchmarking Speaker Recognition
— Unverified 0LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study Sep 13, 2024 Benchmarking Grapheme-to-Phoneme Conversion
— Unverified 0Full-text Error Correction for Chinese Speech Recognition with Large Language Model Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach Sep 10, 2024 Speech Synthesis text-to-speech
— Unverified 0What happens to diffusion model likelihood when your model is conditional? Sep 10, 2024 domain classification model
— Unverified 0AS-Speech: Adaptive Style For Speech Synthesis Sep 9, 2024 Rhythm Speech Synthesis
— Unverified 0Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP Sep 4, 2024 Audio Synthesis Computational Efficiency
— Unverified 0VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka Sep 3, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0vec2wav 2.0: Advancing Voice Conversion via Discrete Token Vocoders Sep 3, 2024 Speech Synthesis Voice Conversion
— Unverified 0SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection Aug 30, 2024 Self-Supervised Learning Speech Synthesis
— Unverified 0Literary and Colloquial Dialect Identification for Tamil using Acoustic Features Aug 27, 2024 Automatic Speech Recognition Dialect Identification
— Unverified 0Which Prosodic Features Matter Most for Pragmatics? Aug 23, 2024 Speech Synthesis
— Unverified 0AI-Based IVR Aug 20, 2024 Speech Synthesis Speech-to-Text
— Unverified 0Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition Aug 17, 2024 Language Modeling Language Modelling
Code Code Available 0WavLM model ensemble for audio deepfake detection Aug 14, 2024 Audio Deepfake Detection Data Augmentation
Code Code Available 0SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis Aug 13, 2024 Speech Synthesis Spoken Dialogue Systems
Code Code Available 0VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders Aug 13, 2024 Speech Synthesis
— Unverified 0Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation Aug 1, 2024 Representation Learning Speech Synthesis
— Unverified 0Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks Jul 26, 2024 Generative Adversarial Network Speech Enhancement
— Unverified 0Towards Improving NAM-to-Speech Synthesis Intelligibility using Self-Supervised Speech Models Jul 26, 2024 Speech Synthesis
— Unverified 0Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning Jul 21, 2024 Representation Learning Self-Supervised Learning
— Unverified 0MSceneSpeech: A Multi-Scene Speech Dataset For Expressive Speech Synthesis Jul 19, 2024 Expressive Speech Synthesis Speech Synthesis
— Unverified 0Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models Jul 18, 2024 Language Modeling Language Modelling
— Unverified 0Autoregressive Speech Synthesis without Vector Quantization Jul 11, 2024 Audio Compression Diversity
— Unverified 0Toward accessible comics for blind and low vision readers Jul 11, 2024 Optical Character Recognition Prompt Engineering
— Unverified 0Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation Jul 8, 2024 Automatic Speech Recognition Emotion Recognition
— Unverified 0We Need Variations in Speech Generation: Sub-center Modelling for Speaker Embeddings Jul 5, 2024 Speaker Recognition Speech Synthesis
— Unverified 0FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder Jul 5, 2024 Generative Adversarial Network Speech Synthesis
— Unverified 0