FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles Jan 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting Dec 28, 2024 Speech Synthesis text-to-speech
— Unverified 0CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Dec 28, 2024 Speech Synthesis
— Unverified 0VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis Dec 26, 2024 Audio Generation Speech Synthesis
— Unverified 0Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis Dec 25, 2024 Contrastive Learning Speech Synthesis
Code Code Available 0MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRI Dec 25, 2024 Decoder Speech Synthesis
— Unverified 0Autoregressive Speech Synthesis with Next-Distribution Prediction Dec 22, 2024 Language Modeling Language Modelling
— Unverified 0Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis Dec 22, 2024 Decoder Disentanglement
— Unverified 0Deep Speech Synthesis from Multimodal Articulatory Representations Dec 17, 2024 Speech Synthesis Transfer Learning
— Unverified 0ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis Dec 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Region-Based Optimization in Continual Learning for Audio Deepfake Detection Dec 16, 2024 Audio Deepfake Detection Continual Learning
Code Code Available 1Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Dec 13, 2024 Conditional Image Generation Image Generation
— Unverified 0AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation Dec 13, 2024 Data Augmentation Sarcasm Detection
— Unverified 0CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Dec 13, 2024 In-Context Learning Quantization
Code Code Available 11Multimodal Latent Language Modeling with Next-Token Diffusion Dec 11, 2024 Image Generation Language Modeling
Code Code Available 0Zero-Shot Mono-to-Binaural Speech Synthesis Dec 11, 2024 Audio Synthesis Denoising
— Unverified 0Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey Dec 9, 2024 Speech Synthesis Survey
Code Code Available 3Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model Dec 4, 2024 Self-Supervised Learning Speech Synthesis
— Unverified 0Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis Nov 26, 2024 Decoder multimodal generation
— Unverified 0VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space Nov 22, 2024 Audio Synthesis Decoder
— Unverified 0SmoothCache: A Universal Inference Acceleration Technique for Diffusion Transformers Nov 15, 2024 Image Generation Speech Synthesis
Code Code Available 1Debatts: Zero-Shot Debating Text-to-Speech Synthesis Nov 10, 2024 Speech Synthesis text-to-speech
— Unverified 0Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data Nov 4, 2024 Speech Synthesis
— Unverified 0Augmenting Polish Automatic Speech Recognition System With Synthetic Data Oct 30, 2024 Automatic Speech Recognition speech-recognition
— Unverified 0Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding Oct 29, 2024 Speech Synthesis text-to-speech
— Unverified 0Mitigating Unauthorized Speech Synthesis for Voice Protection Oct 28, 2024 Data Augmentation Face Swapping
Code Code Available 1Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Oct 27, 2024 parameter-efficient fine-tuning Question Answering
— Unverified 0Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis Oct 24, 2024 Speech Synthesis text-to-speech
— Unverified 0STTATTS: Unified Speech-To-Text And Text-To-Speech Model Oct 24, 2024 Multi-Task Learning speech-recognition
Code Code Available 1Continuous Speech Synthesis using per-token Latent Diffusion Oct 21, 2024 Image Generation Quantization
— Unverified 0A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages Oct 18, 2024 Speech Synthesis text-to-speech
— Unverified 0DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis Oct 17, 2024 Speech Synthesis text-to-speech
— Unverified 0DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech Oct 17, 2024 Disentanglement Quantization
— Unverified 0Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding Oct 17, 2024 Speech Synthesis
— Unverified 0Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR Oct 16, 2024 Denoising Speech Synthesis
— Unverified 0Everyday Speech in the Indian Subcontinent Oct 14, 2024 Speech Synthesis
— Unverified 0DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis Oct 14, 2024 Denoising Speaker Verification
— Unverified 0Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch Oct 9, 2024 Speech Synthesis text-to-speech
— Unverified 0Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Oct 9, 2024 Diversity Speech Synthesis
— Unverified 0HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis Oct 6, 2024 Language Modeling Language Modelling
— Unverified 0Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System Oct 5, 2024 Adversarial Purification Speech Synthesis
— Unverified 0Generative Semantic Communication for Text-to-Speech Synthesis Oct 4, 2024 Quantization Semantic Communication
— Unverified 0MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech Oct 4, 2024 Disentanglement Speech Synthesis
— Unverified 0EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Oct 1, 2024 Emotional Speech Synthesis Speech Synthesis
Code Code Available 2Accent conversion using discrete units with parallel data synthesized from controllable accented TTS Sep 30, 2024 Data Augmentation Speech Synthesis
— Unverified 0Quantitative Analysis of Audio-Visual Tasks: An Information-Theoretic Perspective Sep 29, 2024 Audio-Visual Speech Recognition Lip Reading
— Unverified 0EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis Sep 27, 2024 Speech Synthesis
— Unverified 0StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Sep 24, 2024 Speech Synthesis text-to-speech
— Unverified 0Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech Sep 24, 2024 Emotional Speech Synthesis Speech Synthesis
— Unverified 0