| RNN Approaches to Text Normalization: A Challenge | Oct 31, 2016 | Text Normalizationtext-to-speech | CodeCode Available | 0 |
| Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis | Apr 26, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale | Jun 23, 2023 | In-Context LearningSpeech Synthesis | CodeCode Available | 0 |
| A Comparative Study on Transformer vs RNN in Speech Applications | Sep 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis | Oct 29, 2019 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| Non-Autoregressive Neural Text-to-Speech | May 21, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| ObamaNet: Photo-realistic lip-sync from text | Dec 6, 2017 | Constrained Lip-synchronizationtext-to-speech | CodeCode Available | 0 |
| AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment | Mar 4, 2020 | text-to-speechText to Speech | CodeCode Available | 0 |
| Numbers Normalisation in the Inflected Languages: a Case Study of Polish | Aug 1, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks | Apr 1, 2019 | Feature Engineeringtext-to-speech | CodeCode Available | 0 |
| BanglaFake: Constructing and Evaluating a Specialized Bengali Deepfake Audio Dataset | May 16, 2025 | DeepFake DetectionFace Swapping | CodeCode Available | 0 |
| Neural Voice Puppetry: Audio-driven Facial Reenactment | Dec 11, 2019 | Face ModelNeural Rendering | CodeCode Available | 0 |
| Integrated Speech and Gesture Synthesis | Aug 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Text-to-Video: a Two-stage Framework for Zero-shot Identity-agnostic Talking-head Generation | Aug 12, 2023 | Talking Head Generationtext-to-speech | CodeCode Available | 0 |
| Independent and automatic evaluation of acoustic-to-articulatory inversion models | Nov 15, 2019 | speech-recognitionSpeech Recognition | CodeCode Available | 0 |
| Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks | Sep 23, 2017 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Naturalization of Text by the Insertion of Pauses and Filler Words | Nov 7, 2020 | Sentencetext-to-speech | CodeCode Available | 0 |
| Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation | Mar 31, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Deep Voice 2: Multi-Speaker Neural Text-to-Speech | May 24, 2017 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis | Aug 13, 2024 | Speech SynthesisSpoken Dialogue Systems | CodeCode Available | 0 |
| Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech | Oct 29, 2024 | Decodertext-to-speech | CodeCode Available | 0 |
| CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages | Mar 27, 2019 | text-to-speechText to Speech | CodeCode Available | 0 |
| AraSpot: Arabic Spoken Command Spotting | Mar 29, 2023 | Data AugmentationKeyword Spotting | CodeCode Available | 0 |
| Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech | Oct 18, 2024 | object-detectionObject Detection | CodeCode Available | 0 |
| Multimodal Latent Language Modeling with Next-Token Diffusion | Dec 11, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 0 |
| Cross-Modal Generalization: Learning in Low Resource Modalities via Meta-Alignment | Dec 4, 2020 | Meta-Learningtext-to-speech | CodeCode Available | 0 |
| Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech | Dec 16, 2024 | text-to-speechText to Speech | CodeCode Available | 0 |
| Continuous Speech Tokenizer in Text To Speech | Oct 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| MLS: A Large-Scale Multilingual Dataset for Speech Research | Dec 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers | Sep 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis | Jun 12, 2018 | Speaker VerificationSpeech Synthesis | CodeCode Available | 0 |
| High Fidelity Speech Synthesis with Adversarial Networks | Sep 25, 2019 | Generative Adversarial NetworkSpeech Synthesis | CodeCode Available | 0 |
| A wearable sensor vest for social humanoid robots with GPGPU, IoT, and modular software architecture | Jan 6, 2022 | Speech-to-Texttext-to-speech | CodeCode Available | 0 |
| Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation | Jul 7, 2024 | Text to Speech | CodeCode Available | 0 |
| Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging | Jul 26, 2021 | text-to-speechText to Speech | CodeCode Available | 0 |
| VIFS: An End-to-End Variational Inference for Foley Sound Synthesis | Jun 8, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Semantic Mask for Transformer based End-to-End Speech Recognition | Dec 6, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems | May 21, 2019 | parameter estimationSpeech Synthesis | CodeCode Available | 0 |
| The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS | Oct 6, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Hierarchical Generative Modeling for Controllable Speech Synthesis | Oct 16, 2018 | AttributeSpeech Synthesis | CodeCode Available | 0 |
| MelNet: A Generative Model for Audio in the Frequency Domain | Jun 4, 2019 | Audio GenerationMusic Generation | CodeCode Available | 0 |
| Using generative modelling to produce varied intonation for speech synthesis | Jun 10, 2019 | SentenceSpeech Synthesis | CodeCode Available | 0 |
| Applying Phonological Features in Multilingual Text-To-Speech | Oct 7, 2021 | Language Acquisitiontext-to-speech | CodeCode Available | 0 |
| Massively Multilingual Neural Grapheme-to-Phoneme Conversion | Aug 4, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible | Jul 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech | Aug 27, 2024 | parameter-efficient fine-tuningtext-to-speech | CodeCode Available | 0 |
| Sequence Transduction with Recurrent Neural Networks | Nov 14, 2012 | Machine TranslationPhoneme Recognition | CodeCode Available | 0 |
| Audio Super Resolution using Neural Networks | Aug 2, 2017 | Audio GenerationAudio Super-Resolution | CodeCode Available | 0 |
| Generating Synthetic Speech from SpokenVocab for Speech Translation | Oct 15, 2022 | Data AugmentationMachine Translation | CodeCode Available | 0 |
| Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition | Aug 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |