ITAcotron 2: Transfering English Speech Synthesis Architectures and Speech Features to Italian Nov 1, 2021 Speech Synthesis
Code Code Available 15 Byakto Speech: Real-time long speech synthesis with convolutional neural network: Transfer learning from English to Bangla May 31, 2021 Deep Learning speech-recognition
Code Code Available 15 Region-Based Optimization in Continual Learning for Audio Deepfake Detection Dec 16, 2024 Audio Deepfake Detection Continual Learning
Code Code Available 15 Towards Voice Reconstruction from EEG during Imagined Speech Jan 2, 2023 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis Apr 1, 2024 Speech Synthesis text-to-speech
Code Code Available 15 Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0 Sep 29, 2022 Sentence Speech Synthesis
Code Code Available 15 Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization May 1, 2022 Speech Synthesis
Code Code Available 15 Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech Nov 24, 2023 Dimensionality Reduction Emotion Classification
Code Code Available 15 Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 15 Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis May 17, 2020 Lip Reading Lip to Speech Synthesis
Code Code Available 15 Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet Feb 4, 2025 Speech Synthesis text-to-speech
Code Code Available 15 Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis Jan 11, 2025 Attribute Benchmarking
Code Code Available 15 CDPAM: Contrastive learning for perceptual audio similarity Feb 9, 2021 Contrastive Learning Speech Enhancement
Code Code Available 15 A Resource for Computational Experiments on Mapudungun Dec 4, 2019 Machine Translation speech-recognition
Code Code Available 15 Deep Speech Synthesis from MRI-Based Articulatory Representations Jul 5, 2023 Computational Efficiency Denoising
Code Code Available 15 Deep Speech Synthesis from Articulatory Representations Sep 13, 2022 Speech Synthesis
Code Code Available 15 ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation May 29, 2023 Speech Synthesis text-to-speech
Code Code Available 15 Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Aug 12, 2020 Speech Synthesis text-to-speech
Code Code Available 15 Rasa: Building Expressive Speech Synthesis Systems for Indian Languages in Low-resource Settings Jul 19, 2024 Expressive Speech Synthesis Speech Synthesis
Code Code Available 15 Articulation GAN: Unsupervised modeling of articulatory learning Oct 27, 2022 Generative Adversarial Network Speech Synthesis
Code Code Available 15 MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Oct 8, 2019 CPU GPU
Code Code Available 15 Voice Spoofing Countermeasures: Taxonomy, State-of-the-art, experimental analysis of generalizability, open challenges, and the way forward Oct 2, 2022 Misinformation Speaker Verification
Code Code Available 15 RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks Jun 14, 2022 Action Segmentation Instance Segmentation
Code Code Available 15 PRESENT: Zero-Shot Text-to-Prosody Control Aug 13, 2024 Prosody Prediction Speech Synthesis
Code Code Available 15 Generative Expressive Conversational Speech Synthesis Jul 31, 2024 Speech Synthesis
Code Code Available 15 Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Oct 8, 2021 Emotion Interpretation Expressive Speech Synthesis
Code Code Available 15 Deep Learning Based Assessment of Synthetic Speech Naturalness Apr 23, 2021 Deep Learning Prediction
Code Code Available 15 UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts Apr 29, 2024 Contrastive Learning Speech Synthesis
Code Code Available 15 Cross-modal information fusion for voice spoofing detection Feb 1, 2023 Automatic Speech Recognition fake voice detection
Code Code Available 15 MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline Sep 22, 2022 Speech Synthesis text-to-speech
Code Code Available 15 ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion Mar 29, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 15 Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech Feb 27, 2023 Speech Synthesis text-to-speech
Code Code Available 15 A Spectral Energy Distance for Parallel Speech Synthesis Aug 3, 2020 scoring rule Speech Synthesis
Code Code Available 15 QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning Aug 31, 2023 Representation Learning Speech Representation Learning
Code Code Available 15 ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Sep 23, 2022 Pitch control Speech Synthesis
Code Code Available 15 Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis May 9, 2022 Deep Learning Semantic Communication
Code Code Available 15 Assem-VC: Realistic Voice Conversion by Assembling Modern Speech Synthesis Techniques Apr 2, 2021 Decoder Rhythm
Code Code Available 15 RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis Jun 2, 2021 Diversity Rhythm
Code Code Available 15 Show Me Your Face, And I'll Tell You How You Speak Jun 28, 2022 Lip to Speech Synthesis Speech Synthesis
Code Code Available 15 Synthesizing Speech from Intracranial Depth Electrodes using an Encoder-Decoder Framework Nov 2, 2021 Decoder EEG
Code Code Available 15 Parallel WaveNet: Fast High-Fidelity Speech Synthesis Nov 28, 2017 Speech Synthesis Vocal Bursts Intensity Prediction
Code Code Available 05 A Critical Review of Recurrent Neural Networks for Sequence Learning May 29, 2015 Handwriting Recognition Image Captioning
Code Code Available 05 Partial Rank Similarity Minimization Method for Quality MOS Prediction of Unseen Speech Synthesis Systems in Zero-Shot and Semi-supervised setting Oct 8, 2023 Prediction Speech Synthesis
Code Code Available 05 NVC-Net: End-to-End Adversarial Voice Conversion Jun 2, 2021 GPU Speech Synthesis
Code Code Available 05 OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment Jun 11, 2025 cross-modal alignment Question Answering
Code Code Available 05 Neural Voice Cloning with a Few Samples Feb 14, 2018 Speech Synthesis Voice Cloning
Code Code Available 05 One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection Jun 24, 2024 Audio Deepfake Detection DeepFake Detection
Code Code Available 05 Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis Apr 26, 2021 Language Modeling Language Modelling
Code Code Available 05 Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis Feb 28, 2020 Speech Synthesis text-to-speech
Code Code Available 05 Neural Autoregressive Flows Apr 3, 2018 Density Estimation Speech Synthesis
Code Code Available 05