Multimodal Latent Language Modeling with Next-Token Diffusion Dec 11, 2024 Image Generation Language Modeling
Code Code Available 0Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers Sep 5, 2022 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis Feb 28, 2020 Speech Synthesis text-to-speech
Code Code Available 0Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq May 25, 2018 Automatic Speech Recognition Automatic Speech Recognition (ASR)
Code Code Available 0Improving Self-Supervised Learning-based MOS Prediction Networks Apr 23, 2022 Prediction Quantization
Code Code Available 0Articulatory Feature Prediction from Surface EMG during Speech Production May 20, 2025 Electromyography (EMG) Speech Synthesis
Code Code Available 0MelNet: A Generative Model for Audio in the Frequency Domain Jun 4, 2019 Audio Generation Music Generation
Code Code Available 0Maximizing Mutual Information for Tacotron Aug 30, 2019 Attribute Speech Synthesis
Code Code Available 0SampleRNN: An Unconditional End-to-End Neural Audio Generation Model Dec 22, 2016 Audio Generation Speech Synthesis
Code Code Available 0Speech waveform synthesis from MFCC sequences with generative adversarial networks Apr 3, 2018 Generative Adversarial Network Speech Synthesis
Code Code Available 0SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis Aug 13, 2024 Speech Synthesis Spoken Dialogue Systems
Code Code Available 0The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems Jun 25, 2018 Speech Emotion Recognition Speech Synthesis
Code Code Available 0Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Oct 23, 2019 Form Speech Synthesis
Code Code Available 0Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms May 18, 2023 Speech Synthesis
Code Code Available 0Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM May 24, 2023 Language Modelling Question Answering
Code Code Available 0SDS-200: A Swiss German Speech to Standard German Text Corpus May 19, 2022 Speech Synthesis Translation
Code Code Available 0Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation Mar 31, 2024 Language Modeling Language Modelling
Code Code Available 0Using generative modelling to produce varied intonation for speech synthesis Jun 10, 2019 Sentence Speech Synthesis
Code Code Available 0Spoof detection using time-delay shallow neural network and feature switching Apr 16, 2019 Speaker Verification Speech Synthesis
Code Code Available 0Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis Oct 29, 2019 Speaker Verification Speech Synthesis
Code Code Available 0Using hyperlinks to improve multilingual partial parsers Sep 1, 2017 Machine Translation Speech Synthesis
Code Code Available 0High Fidelity Speech Synthesis with Adversarial Networks Sep 25, 2019 Generative Adversarial Network Speech Synthesis
Code Code Available 0Self-Supervised Learning for Speech Enhancement through Synthesis Nov 4, 2022 Denoising Self-Supervised Learning
Code Code Available 0Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Jun 12, 2018 Speaker Verification Speech Synthesis
Code Code Available 0What does a network layer hear? Analyzing hidden representations of end-to-end ASR through speech synthesis Nov 4, 2019 Speaker Verification Speech Enhancement
Code Code Available 0Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks Sep 23, 2017 Speech Synthesis text-to-speech
Code Code Available 0Deep Voice 2: Multi-Speaker Neural Text-to-Speech May 24, 2017 Speech Synthesis text-to-speech
Code Code Available 0STC Antispoofing Systems for the ASVspoof2019 Challenge Apr 11, 2019 Speech Synthesis Voice Conversion
Code Code Available 0DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis Dec 9, 2020 Speaker Recognition Speech Synthesis
Code Code Available 0Hierarchical Generative Modeling for Controllable Speech Synthesis Oct 16, 2018 Attribute Speech Synthesis
Code Code Available 0Learning pronunciation from a foreign language in speech synthesis networks Oct 22, 2018 Speech Synthesis
Code Code Available 0Learning latent representations for style control and transfer in end-to-end speech synthesis Dec 11, 2018 Speech Synthesis Style Transfer
Code Code Available 0Deep Voice: Real-time Neural Text-to-Speech Feb 25, 2017 Audio Synthesis Boundary Detection
Code Code Available 0Half-Truth: A Partially Fake Audio Detection Dataset Apr 8, 2021 Speech Synthesis
Code Code Available 0Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition Aug 17, 2024 Language Modeling Language Modelling
Code Code Available 0Language Technology Programme for Icelandic 2019-2023 Mar 20, 2020 Machine Translation speech-recognition
Code Code Available 0WavLM model ensemble for audio deepfake detection Aug 14, 2024 Audio Deepfake Detection Data Augmentation
Code Code Available 0The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection Oct 6, 2022 Speech Synthesis Synthetic Speech Detection
Code Code Available 0Direct speech-to-speech translation with a sequence-to-sequence model Apr 12, 2019 Speech Synthesis Speech-to-Speech Translation
Code Code Available 0Why We Should Report the Details in Subjective Evaluation of TTS More Rigorously Jun 3, 2023 Speech Synthesis
Code Code Available 0Deep Residual Neural Networks for Audio Spoofing Detection Jun 30, 2019 Speaker Verification Speech Synthesis
Code Code Available 0GELP: GAN-Excited Linear Prediction for Speech Synthesis from Mel-spectrogram Apr 8, 2019 Speech Synthesis text-to-speech
Code Code Available 0FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis Jul 8, 2022 Lip to Speech Synthesis Speech Synthesis
Code Code Available 0fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit Sep 14, 2021 Speech Synthesis text-to-speech
Code Code Available 0DeepGesture: A conversational gesture synthesis system based on emotions and semantics Jul 3, 2025 Gesture Generation Motion Synthesis
Code Code Available 0A Critical Review of Recurrent Neural Networks for Sequence Learning May 29, 2015 Handwriting Recognition Image Captioning
Code Code Available 0Time out of Mind: Generating Rate of Speech conditioned on emotion and speaker Jan 29, 2023 Speech Synthesis text-to-speech
Code Code Available 0JSUT corpus: free large-scale Japanese speech corpus for end-to-end speech synthesis Oct 28, 2017 BIG-bench Machine Learning Speech Synthesis
Code Code Available 0JSSS: free Japanese speech corpus for summarization and simplification Oct 5, 2020 Form Speech Synthesis
Code Code Available 0