| Towards Fully Automatic Annotation of Audio Books for TTS | May 1, 2012 | Speech RecognitionSpeech Synthesis | —Unverified | 0 |
| Towards human-like spoken dialogue generation between AI agents from written dialogue | Oct 2, 2023 | Dialogue Generationtext-to-speech | —Unverified | 0 |
| Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement | Jan 15, 2025 | Computational EfficiencyCPU | —Unverified | 0 |
| Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale | Aug 21, 2022 | LipreadingLip Reading | —Unverified | 0 |
| Towards Natural and Controllable Cross-Lingual Voice Conversion Based on Neural TTS Model and Phonetic Posteriorgram | Feb 3, 2021 | text-to-speechText to Speech | —Unverified | 0 |
| Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion | Oct 16, 2020 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Towards Optimizing OCR for Accessibility | Jun 21, 2022 | Optical Character Recognition (OCR)text-to-speech | —Unverified | 0 |
| Towards Robust FastSpeech 2 by Modelling Residual Multimodality | Jun 2, 2023 | Decodertext-to-speech | —Unverified | 0 |
| Towards Robust Neural Vocoding for Speech Generation: A Survey | Dec 5, 2019 | Speech SynthesisSurvey | —Unverified | 0 |
| Prosody Analysis of Audiobooks | Oct 10, 2023 | AttributeLanguage Modeling | CodeCode Available | 0 |
| ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit | Oct 24, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Systematic Inequalities in Language Technology Performance across the World's Languages | Oct 13, 2021 | Dependency ParsingMachine Translation | CodeCode Available | 0 |
| Systematic Inequalities in Language Technology Performance across the World’s Languages | May 1, 2022 | Dependency ParsingMachine Translation | CodeCode Available | 0 |
| Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding | Jul 12, 2024 | regressiontext-to-speech | CodeCode Available | 0 |
| FPETS : Fully Parallel End-to-End Text-to-Speech System | Dec 12, 2018 | text-to-speechText to Speech | CodeCode Available | 0 |
| QSpeech: Low-Qubit Quantum Speech Application Toolkit | May 26, 2022 | text-to-speechText to Speech | CodeCode Available | 0 |
| PromptTTS: Controllable Text-to-Speech with Text Descriptions | Nov 22, 2022 | DecoderSpeech Synthesis | CodeCode Available | 0 |
| FluentEditor2: Text-based Speech Editing by Modeling Multi-Scale Acoustic and Prosody Consistency | Sep 28, 2024 | Text to Speech | CodeCode Available | 0 |
| Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022 | May 1, 2022 | DecoderKnowledge Distillation | CodeCode Available | 0 |
| Empirical Evaluation of Deep Learning Model Compression Techniques on the WaveNet Vocoder | Nov 20, 2020 | Model CompressionQuantization | CodeCode Available | 0 |
| Few-Shot Speech Deepfake Detection Adaptation with Gaussian Processes | May 29, 2025 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 0 |
| Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling | Oct 12, 2024 | text-to-speechText to Speech | CodeCode Available | 0 |
| Direct speech-to-speech translation with a sequence-to-sequence model | Apr 12, 2019 | Speech SynthesisSpeech-to-Speech Translation | CodeCode Available | 0 |
| Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting | Feb 19, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 | Oct 25, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish | May 31, 2022 | Machine TranslationSpeech Synthesis | CodeCode Available | 0 |
| fairseq S^2: A Scalable and Integrable Speech Synthesis Toolkit | Sep 14, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| SPEECH-COCO: 600k Visually Grounded Spoken Captions Aligned to MSCOCO Data Set | Jul 26, 2017 | text-to-speechText to Speech | CodeCode Available | 0 |
| SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network | Jul 17, 2024 | text-to-speechText to Speech | CodeCode Available | 0 |
| Latent Optimal Paths by Gumbel Propagation for Variational Bayesian Dynamic Programming | Jun 5, 2023 | Bayesian InferenceSinging Voice Synthesis | CodeCode Available | 0 |
| Emotional Voice Conversion using Multitask Learning with Text-to-speech | Nov 11, 2019 | Decodertext-to-speech | CodeCode Available | 0 |
| JSSS: free Japanese speech corpus for summarization and simplification | Oct 5, 2020 | FormSpeech Synthesis | CodeCode Available | 0 |
| "I've Heard of You!": Generate Spoken Named Entity Recognition Data for Unseen Entities | Dec 26, 2024 | Domain AdaptationLanguage Modeling | CodeCode Available | 0 |
| Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis | Oct 9, 2021 | Lifelong learningSpeech Synthesis | CodeCode Available | 0 |
| AI4D -- African Language Program | Apr 6, 2021 | Machine Translationspeech-recognition | CodeCode Available | 0 |
| A Fully Time-domain Neural Model for Subband-based Speech Synthesizer | Oct 12, 2018 | text-to-speechText to Speech | CodeCode Available | 0 |
| Predicting distributions with Linearizing Belief Networks | Nov 17, 2015 | DenoisingFacial expression generation | CodeCode Available | 0 |
| Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input | Jul 5, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 0 |
| Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning | Jun 5, 2020 | Self-Supervised LearningSpeaker Verification | CodeCode Available | 0 |
| Deep Voice: Real-time Neural Text-to-Speech | Feb 25, 2017 | Audio SynthesisBoundary Detection | CodeCode Available | 0 |
| IsoChronoMeter: A simple and effective isochronic translation evaluation metric | Oct 14, 2024 | Machine Translationtext-to-speech | CodeCode Available | 0 |
| Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning | Oct 20, 2017 | GPUSpeech Synthesis | CodeCode Available | 0 |
| EmoNews: A Spoken Dialogue System for Expressive News Conversations | Jun 16, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| When Is TTS Augmentation Through a Pivot Language Useful? | Jul 20, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Facial Landmark Predictions with Applications to Metaverse | Sep 29, 2022 | Decodertext-to-speech | CodeCode Available | 0 |
| Extending Text-to-Speech Synthesis with Articulatory Movement Prediction using Ultrasound Tongue Imaging | Jul 12, 2021 | PredictionSpeech Synthesis | CodeCode Available | 0 |
| Text-to-ECG: 12-Lead Electrocardiogram Synthesis conditioned on Clinical Text Reports | Mar 9, 2023 | text-to-speechText to Speech | CodeCode Available | 0 |
| Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study | Jan 22, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset | May 14, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 0 |
| Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language | Oct 29, 2018 | Speech Synthesistext-to-speech | CodeCode Available | 0 |