| Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion | May 25, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 0 | 5 |
| An Open Source Web Reader for Under-Resourced Languages | Jun 1, 2022 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy | Oct 13, 2022 | Generative Adversarial NetworkSpeaker anonymization | CodeCode Available | 0 | 5 |
| ObamaNet: Photo-realistic lip-sync from text | Dec 6, 2017 | Constrained Lip-synchronizationtext-to-speech | CodeCode Available | 0 | 5 |
| SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network | Jul 17, 2024 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Numbers Normalisation in the Inflected Languages: a Case Study of Polish | Aug 1, 2019 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting | Feb 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| An investigation of phrase break prediction in an End-to-End TTS system | Apr 9, 2023 | Predictiontext-to-speech | CodeCode Available | 0 | 5 |
| BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset | May 16, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 0 | 5 |
| Neural Voice Puppetry: Audio-driven Facial Reenactment | Dec 11, 2019 | Face ModelNeural Rendering | CodeCode Available | 0 | 5 |
| A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture | Jan 6, 2022 | Speech-to-Texttext-to-speech | CodeCode Available | 0 | 5 |
| Multimodal Latent Language Modeling with Next-Token Diffusion | Dec 11, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 0 | 5 |
| Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech | Dec 16, 2024 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech | Oct 18, 2024 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| MLS: A Large-Scale Multilingual Dataset for Speech Research | Dec 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Meta Learning Text-to-Speech Synthesis in over 7000 Languages | Jun 10, 2024 | Meta-LearningSpeech Synthesis | CodeCode Available | 0 | 5 |
| MelNet: A Generative Model for Audio in the Frequency Domain | Jun 4, 2019 | Audio GenerationMusic Generation | CodeCode Available | 0 | 5 |
| Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers | Sep 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning | Oct 20, 2017 | GPUSpeech Synthesis | CodeCode Available | 0 | 5 |
| Deep Voice 2: Multi-Speaker Neural Text-to-Speech | May 24, 2017 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Luganda Text-to-Speech Machine | May 11, 2020 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis | Oct 23, 2019 | FormSpeech Synthesis | CodeCode Available | 0 | 5 |
| Low-Resource Multilingual and Zero-Shot Multispeaker TTS | Oct 21, 2022 | Meta-Learningtext-to-speech | CodeCode Available | 0 | 5 |
| LibriS2S: A German-English Speech-to-Speech Translation Corpus | Apr 22, 2022 | Speech-to-Speech TranslationSpeech-to-Text | CodeCode Available | 0 | 5 |
| Let's Give a Voice to Conversational Agents in Virtual Reality | Aug 4, 2023 | Speech-to-Texttext-to-speech | CodeCode Available | 0 | 5 |
| Learning Speaker Embedding from Text-to-Speech | Oct 21, 2020 | ClassificationDecoder | CodeCode Available | 0 | 5 |
| Audio Super Resolution using Neural Networks | Aug 2, 2017 | Audio GenerationAudio Super-Resolution | CodeCode Available | 0 | 5 |
| Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding | Jul 12, 2024 | regressiontext-to-speech | CodeCode Available | 0 | 5 |
| MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible | Jul 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features | Mar 7, 2022 | Meta-Learningtext-to-speech | CodeCode Available | 0 | 5 |
| JSSS: free Japanese speech corpus for summarization and simplification | Oct 5, 2020 | FormSpeech Synthesis | CodeCode Available | 0 | 5 |
| Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming | Jun 5, 2023 | Bayesian InferenceSinging Voice Synthesis | CodeCode Available | 0 | 5 |
| "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities | Dec 26, 2024 | Domain AdaptationLanguage Modeling | CodeCode Available | 0 | 5 |
| IsoChronoMeter: A simple and effective isochronic translation evaluation metric | Oct 14, 2024 | Machine Translationtext-to-speech | CodeCode Available | 0 | 5 |
| Integrated Speech and Gesture Synthesis | Aug 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech | Mar 6, 2021 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language | Oct 29, 2018 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages | Mar 27, 2019 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Independent and automatic evaluation of acoustic-to-articulatory inversion models | Nov 15, 2019 | speech-recognitionSpeech Recognition | CodeCode Available | 0 | 5 |
| Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network | Jan 31, 2020 | QuantizationSpeech Synthesis | CodeCode Available | 0 | 5 |
| Massively Multilingual Neural Grapheme-to-Phoneme Conversion | Aug 4, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Naturalization of Text by the Insertion of Pauses and Filler Words | Nov 7, 2020 | Sentencetext-to-speech | CodeCode Available | 0 | 5 |
| High Fidelity Speech Synthesis with Adversarial Networks | Sep 25, 2019 | Generative Adversarial NetworkSpeech Synthesis | CodeCode Available | 0 | 5 |
| Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation | Mar 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Hierarchical Generative Modeling for Controllable Speech Synthesis | Oct 16, 2018 | AttributeSpeech Synthesis | CodeCode Available | 0 | 5 |
| Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis | Nov 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 0 | 5 |
| Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment | Dec 4, 2020 | Meta-Learningtext-to-speech | CodeCode Available | 0 | 5 |
| Attentive Multi-Layer Perceptron for Non-autoregressive Generation | Oct 14, 2023 | Machine TranslationSpeech Synthesis | CodeCode Available | 0 | 5 |
| Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition | Aug 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems | Dec 19, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |