Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM Feb 24, 2025 Automatic Speech Recognition Language Modeling
— Unverified 0High-Fidelity Music Vocoder using Neural Audio Codecs Feb 18, 2025 Decoder Speech Synthesis
— Unverified 0AV-Flow: Transforming Text to Audio-Visual Human-like Interactions Feb 18, 2025 Speech Synthesis
— Unverified 0A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond Feb 17, 2025 Contrastive Learning EEG
— Unverified 0NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing Feb 17, 2025 Lip to Speech Synthesis speech-recognition
— Unverified 0FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching Feb 16, 2025 Language Modeling Language Modelling
— Unverified 0ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech Feb 13, 2025 Adversarial Attack Adversarial Attack Detection
— Unverified 0LoRP-TTS: Low-Rank Personalized Text-To-Speech Feb 11, 2025 Speech Synthesis text-to-speech
— Unverified 0Non-invasive electromyographic speech neuroprosthesis: a geometric perspective Feb 9, 2025 Speech Synthesis
— Unverified 0Gender Bias in Instruction-Guided Speech Synthesis Models Feb 8, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis Feb 3, 2025 Quantization Speech Synthesis
— Unverified 0Compact Neural TTS Voices for Accessibility Jan 28, 2025 Speech Synthesis text-to-speech
— Unverified 0Generalizable Audio Deepfake Detection via Latent Space Refinement and Augmentation Jan 24, 2025 Audio Deepfake Detection DeepFake Detection
— Unverified 0Generative Data Augmentation Challenge: Zero-Shot Speech Synthesis for Personalized Speech Enhancement Jan 23, 2025 Data Augmentation Speech Enhancement
— Unverified 0Speech Synthesis along Perceptual Voice Quality Dimensions Jan 15, 2025 Expressive Speech Synthesis Speech Synthesis
— Unverified 0A Non-autoregressive Model for Joint STT and TTS Jan 15, 2025 Automatic Speech Recognition speech-recognition
— Unverified 0Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech Jan 13, 2025 Speech Synthesis
— Unverified 0Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer Jan 10, 2025 speech-recognition Speech Recognition
— Unverified 0PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Probing Speaker-specific Features in Speaker Representations Jan 9, 2025 Self-Supervised Learning Speaker Verification
— Unverified 0JELLY: Joint Emotion Recognition and Context Reasoning with LLMs for Conversational Speech Synthesis Jan 9, 2025 Emotion Recognition Language Modeling
— Unverified 0FleSpeech: Flexibly Controllable Speech Generation with Various Prompts Jan 8, 2025 Speech Synthesis
— Unverified 0FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles Jan 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting Dec 28, 2024 Speech Synthesis text-to-speech
— Unverified 0CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation Dec 28, 2024 Speech Synthesis
— Unverified 0VoiceDiT: Dual-Condition Diffusion Transformer for Environment-Aware Speech Synthesis Dec 26, 2024 Audio Generation Speech Synthesis
— Unverified 0MRI2Speech: Speech Synthesis from Articulatory Movements Recorded by Real-time MRI Dec 25, 2024 Decoder Speech Synthesis
— Unverified 0Intra- and Inter-modal Context Interaction Modeling for Conversational Speech Synthesis Dec 25, 2024 Contrastive Learning Speech Synthesis
Code Code Available 0Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis Dec 22, 2024 Decoder Disentanglement
— Unverified 0Autoregressive Speech Synthesis with Next-Distribution Prediction Dec 22, 2024 Language Modeling Language Modelling
— Unverified 0Deep Speech Synthesis from Multimodal Articulatory Representations Dec 17, 2024 Speech Synthesis Transfer Learning
— Unverified 0ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis Dec 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Dec 13, 2024 Conditional Image Generation Image Generation
— Unverified 0AMuSeD: An Attentive Deep Neural Network for Multimodal Sarcasm Detection Incorporating Bi-modal Data Augmentation Dec 13, 2024 Data Augmentation Sarcasm Detection
— Unverified 0Multimodal Latent Language Modeling with Next-Token Diffusion Dec 11, 2024 Image Generation Language Modeling
Code Code Available 0Zero-Shot Mono-to-Binaural Speech Synthesis Dec 11, 2024 Audio Synthesis Denoising
— Unverified 0Analytic Study of Text-Free Speech Synthesis for Raw Audio using a Self-Supervised Learning Model Dec 4, 2024 Self-Supervised Learning Speech Synthesis
— Unverified 0Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis Nov 26, 2024 Decoder multimodal generation
— Unverified 0VQalAttent: a Transparent Speech Generation Pipeline based on Transformer-learned VQ-VAE Latent Space Nov 22, 2024 Audio Synthesis Decoder
— Unverified 0Debatts: Zero-Shot Debating Text-to-Speech Synthesis Nov 10, 2024 Speech Synthesis text-to-speech
— Unverified 0Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data Nov 4, 2024 Speech Synthesis
— Unverified 0Augmenting Polish Automatic Speech Recognition System With Synthetic Data Oct 30, 2024 Automatic Speech Recognition speech-recognition
— Unverified 0Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding Oct 29, 2024 Speech Synthesis text-to-speech
— Unverified 0Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation Oct 27, 2024 parameter-efficient fine-tuning Question Answering
— Unverified 0Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis Oct 24, 2024 Speech Synthesis text-to-speech
— Unverified 0Continuous Speech Synthesis using per-token Latent Diffusion Oct 21, 2024 Image Generation Quantization
— Unverified 0A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages Oct 18, 2024 Speech Synthesis text-to-speech
— Unverified 0Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding Oct 17, 2024 Speech Synthesis
— Unverified 0DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis Oct 17, 2024 Speech Synthesis text-to-speech
— Unverified 0