Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 151–200 of 332 papers

Title	Date	Tasks	Status
Variations prosodiques en synth\`ese par s\'election d'unit\'es: l'exemple des phrases interrogatives (Prosodic variations in unit-based speech synthesis: the example of interrogative sentences) [in French]	Jun 1, 2012	Speech SynthesisText-To-Speech Synthesis	—Unverified
Learning Sentiment Lexicons in Spanish	May 1, 2012	Opinion MiningQuestion Answering	—Unverified
Leveraging supplemental representations for sequential transduction	Jun 1, 2012	Speech SynthesisText-To-Speech Synthesis	—Unverified
Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications	May 12, 2025	Speech Synthesistext-to-speech	—Unverified
A Review of Deep Learning Techniques for Speech Processing	Apr 30, 2023	Automatic Speech RecognitionDeep Learning	—Unverified
Listening while Speaking: Speech Chain by Deep Learning	Jul 16, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Location, Location: Enhancing the Evaluation of Text-to-Speech Synthesis Using the Rapid Prosody Transcription Paradigm	Jul 6, 2021	Speech Synthesistext-to-speech	—Unverified
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network	Sep 22, 2021	Knowledge DistillationLanguage Modeling	—Unverified
Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis	May 3, 2023	Speech Synthesistext-to-speech	—Unverified
Machine Speech Chain with One-shot Speaker Adaptation	Mar 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Vers une annotation automatique de corpus audio pour la synth\`ese de parole (Towards Fully Automatic Annotation of Audio Books for Text-To-Speech (TTS) Synthesis) [in French]	Jun 1, 2012	Speech Synthesistext-to-speech	—Unverified
Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis	Mar 29, 2022	Speech Synthesistext-to-speech	—Unverified
Meta Learning Text-to-Speech Synthesis in over 7000 Languages	Jun 10, 2024	Meta-LearningSpeech Synthesis	—Unverified
Minimally Supervised Number Normalization	Jan 1, 2016	speech-recognitionSpeech Recognition	—Unverified
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech	Oct 27, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MM-TTS: Multi-modal Prompt based Style Transfer for Expressive Text-to-Speech Synthesis	Dec 17, 2023	Speech SynthesisStyle Transfer	—Unverified
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS	Sep 30, 2024	Data AugmentationSpeech Synthesis	—Unverified
Modular Meta-Learning with Shrinkage	Sep 12, 2019	Image ClassificationMeta-Learning	—Unverified
Applying Automated Machine Translation to Educational Video Courses	Jan 9, 2023	Machine TranslationSpeech Synthesis	—Unverified
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting	May 19, 2023	Speech Synthesistext-to-speech	—Unverified
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning	Feb 10, 2021	Speech Synthesistext-to-speech	—Unverified
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis	Jun 16, 2024	DisentanglementSpeech Synthesis	—Unverified
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer	Sep 3, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios	Dec 23, 2021	DiversitySpeech Synthesis	—Unverified
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes	Aug 7, 2020	Gaussian ProcessesSpeech Synthesis	—Unverified
Multi-Stage Deep Transfer Learning for EmIoT-enabled Human-Computer Interaction	Feb 3, 2022	Human-Object Interaction Detectiontext-to-speech	—Unverified
Multi-step Natural Language Understanding	Aug 1, 2013	Natural Language UnderstandingSpeech Recognition	—Unverified
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models	Nov 17, 2022	Speech Synthesistext-to-speech	—Unverified
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era	Oct 6, 2022	Speech Synthesistext-to-speech	—Unverified
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis	Aug 27, 2019	Speech Synthesistext-to-speech	—Unverified
Neural Models of Text Normalization for Speech Applications	Jun 1, 2019	BIG-bench Machine LearningSpeech Synthesis	—Unverified
Neural Speech Synthesis in German	Oct 3, 2021	Speech Synthesistext-to-speech	—Unverified
A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions	Jun 4, 2025	Data AugmentationDiversity	—Unverified
Neural Text Normalization with Subword Units	Jun 1, 2019	Machine TranslationNatural Language Understanding	—Unverified
Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan	May 1, 2020	Speech Synthesistext-to-speech	—Unverified
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters	Jan 10, 2024	Self-Supervised LearningSpeech Enhancement	—Unverified
Normalization of Lithuanian Text Using Regular Expressions	Dec 29, 2023	Speech SynthesisText Normalization	—Unverified
Normalization of Non-Standard Words in Croatian Texts	Mar 27, 2015	FormGeneral Classification	—Unverified
Normalizing Text using Language Modelling based on Phonetics and String Similarity	Jun 25, 2020	Language ModelingLanguage Modelling	—Unverified
Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing	May 1, 2012	ChunkingDescriptive	—Unverified
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature	Apr 2, 2022	Speech Synthesistext-to-speech	—Unverified
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis	Jun 3, 2021	Speaker VerificationSpeech Synthesis	—Unverified
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified
Parallel WaveNet conditioned on VAE latent vectors	Dec 17, 2020	SentenceSpeech Synthesis	—Unverified
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations	Mar 1, 2023	Self-Supervised LearningSpeech Synthesis	—Unverified
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis	Jun 4, 2024	In-Context LearningLanguage Modeling	—Unverified
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS	Mar 28, 2021	Representation LearningText-To-Speech Synthesis	—Unverified
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis	Dec 8, 2023	BenchmarkingQuantization	—Unverified
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	Aug 4, 2018	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 4 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified