Effective Deep Learning Models for Automatic Diacritization of Arabic Text Nov 1, 2020 Arabic Text Diacritization Decoder
Code Code Available 15 Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder Nov 7, 2022 Speech Synthesis text-to-speech
Code Code Available 15 Automatic Prosody Annotation with Pre-Trained Text-Speech Model Jun 16, 2022 Speech Synthesis text-to-speech
Code Code Available 15 EdiTTS: Score-based Editing for Controllable Text-to-Speech Oct 6, 2021 Speech Synthesis Speech-to-Text
Code Code Available 15 RAD-TTS: Parallel Flow-Based TTS with Robust Alignment Learning and Diverse Synthesis Jun 2, 2021 Diversity Rhythm
Code Code Available 15 Neural HMMs are all you need (for high-quality attention-free TTS) Aug 30, 2021 All Speech Synthesis
Code Code Available 15 Disentanglement in a GAN for Unconditional Speech Synthesis Jul 4, 2023 Disentanglement Generative Adversarial Network
Code Code Available 15 Digital Voicing of Silent Speech Oct 6, 2020 Electromyography (EMG) Speech Synthesis
Code Code Available 15 dMel: Speech Tokenization made Simple Jul 22, 2024 Decoder Language Modeling
Code Code Available 15 Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism Dec 11, 2023 Face Generation Lip Reading
Code Code Available 15 AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder Jan 9, 2025 Pitch Classification Pitch control
Code Code Available 15 Automatic Tuning of Loss Trade-offs without Hyper-parameter Search in End-to-End Zero-Shot Speech Synthesis May 26, 2023 Decoder Speech Synthesis
Code Code Available 15 Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme Sep 28, 2021 Speech Synthesis Voice Conversion
Code Code Available 15 EfficientNet-Absolute Zero for Continuous Speech Keyword Spotting Dec 31, 2020 Keyword Spotting Keyword Spotting CSS
Code Code Available 15 A Neuro-AI Interface for Evaluating Generative Adversarial Networks Mar 5, 2020 Speech Synthesis
Code Code Available 15 AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing Oct 24, 2023 Language Modeling Language Modelling
Code Code Available 15 DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding Aug 15, 2023 Speech Synthesis
Code Code Available 15 Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data May 18, 2023 Speech Enhancement Speech Synthesis
Code Code Available 15 DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training Jul 31, 2023 Denoising Expressive Speech Synthesis
Code Code Available 15 DiffWave: A Versatile Diffusion Model for Audio Synthesis Sep 21, 2020 Audio Synthesis Diversity
Code Code Available 15 Synthetic-Neuroscore: Using A Neuro-AI Interface for Evaluating Generative Adversarial Networks May 10, 2019 Image Generation Speech Synthesis
Code Code Available 15 TTS-Portuguese Corpus: a corpus for speech synthesis in Brazilian Portuguese May 11, 2020 Denoising Speech Synthesis
Code Code Available 15 Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet Feb 4, 2025 Speech Synthesis text-to-speech
Code Code Available 15 Detection of Prosodic Boundaries in Speech Using Wav2Vec 2.0 Sep 29, 2022 Sentence Speech Synthesis
Code Code Available 15 NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity Jun 11, 2020 Density Estimation Normalising Flows
Code Code Available 15 Deep Speech Synthesis from MRI-Based Articulatory Representations Jul 5, 2023 Computational Efficiency Denoising
Code Code Available 15 Exploring Transfer Learning for Low Resource Emotional TTS Jan 14, 2019 Deep Learning Emotional Speech Synthesis
Code Code Available 15 Deep Speech Synthesis from Articulatory Representations Sep 13, 2022 Speech Synthesis
Code Code Available 15 Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis Mar 5, 2021 Speech Synthesis
Code Code Available 15 Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis May 9, 2022 Deep Learning Semantic Communication
Code Code Available 15 Dynamical Variational Autoencoders: A Comprehensive Review Aug 28, 2020 3D Human Dynamics Resynthesis
Code Code Available 15 FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Jun 29, 2021 Speech Synthesis text-to-speech
Code Code Available 15 Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration May 25, 2023 Speech Synthesis text-to-speech
Code Code Available 15 Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Dec 16, 2017 Speech Synthesis
Code Code Available 15 One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech Aug 3, 2020 Meta-Learning Speech Synthesis
Code Code Available 15 Fine-grained style control in Transformer-based Text-to-speech Synthesis Oct 12, 2021 Inductive Bias Speech Synthesis
Code Code Available 15 FonBund: A Library for Combining Cross-lingual Phonological Segment Data May 1, 2018 Language Modeling Language Modelling
Code Code Available 15 FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection Oct 18, 2021 Speech Synthesis Synthetic Speech Detection
Code Code Available 15 Cross-modal information fusion for voice spoofing detection Feb 1, 2023 Automatic Speech Recognition fake voice detection
Code Code Available 15 Bts-e: Audio deepfake detection using breathing-talking-silence encoder May 5, 2023 Audio Deepfake Detection DeepFake Detection
Code Code Available 15 Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Oct 8, 2021 Emotion Interpretation Expressive Speech Synthesis
Code Code Available 15 Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 15 Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech May 13, 2021 Decoder Speech Synthesis
Code Code Available 15 MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset Dec 11, 2022 Speech Synthesis text-to-speech
Code Code Available 15 APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra Nov 20, 2023 Speech Synthesis
Code Code Available 15 ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed Sep 23, 2022 Pitch control Speech Synthesis
Code Code Available 15 Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings Oct 7, 2021 Language Modeling Language Modelling
Code Code Available 15 ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation May 29, 2023 Speech Synthesis text-to-speech
Code Code Available 15 Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Aug 12, 2020 Speech Synthesis text-to-speech
Code Code Available 15 Conditional Sound Generation Using Neural Discrete Time-Frequency Representation Learning Jul 21, 2021 Diversity Music Generation
Code Code Available 15