Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 51–100 of 332 papers

Title	Date	Tasks	Status	Hype	Score
Multi-Singer: Fast Multi-Singer Singing Voice Vocoder With A Large-Scale Corpus	Dec 20, 2021	Audio GenerationSinging Voice Synthesis	CodeCode Available	1	5
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1	5
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset	Dec 11, 2022	Speech Synthesistext-to-speech	CodeCode Available	1	5
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline	Sep 22, 2022	Speech Synthesistext-to-speech	CodeCode Available	1	5
EdiTTS: Score-based Editing for Controllable Text-to-Speech	Oct 6, 2021	Speech SynthesisSpeech-to-Text	CodeCode Available	1	5
ArTST: Arabic Text and Speech Transformer	Oct 25, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning	Aug 31, 2023	Representation LearningSpeech Representation Learning	CodeCode Available	1	5
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts	Apr 29, 2024	Contrastive LearningSpeech Synthesis	CodeCode Available	1	5
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion	Aug 13, 2020	Speech Synthesistext-to-speech	CodeCode Available	1	5
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech	Nov 24, 2023	Dimensionality ReductionEmotion Classification	CodeCode Available	1	5
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset	Apr 17, 2021	Speech Synthesistext-to-speech	CodeCode Available	1	5
Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition	Mar 29, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis	Nov 6, 2020	DecoderSpeech Synthesis	CodeCode Available	1	5
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation	Aug 3, 2023	DecoderQuantization	CodeCode Available	1	5
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech	May 13, 2021	DecoderSpeech Synthesis	CodeCode Available	1	5
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech	Feb 27, 2023	Speech Synthesistext-to-speech	CodeCode Available	1	5
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone	Dec 4, 2021	Speech SynthesisText-To-Speech Synthesis	CodeCode Available	1	5
Automatic Prosody Annotation with Pre-Trained Text-Speech Model	Jun 16, 2022	Speech Synthesistext-to-speech	CodeCode Available	1	5
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis	May 12, 2020	Speech SynthesisStyle Transfer	CodeCode Available	1	5
Fine-grained style control in Transformer-based Text-to-speech Synthesis	Oct 12, 2021	Inductive BiasSpeech Synthesis	CodeCode Available	1	5
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech	Jun 8, 2020	Knowledge DistillationSpeech Synthesis	CodeCode Available	1	5
Exploring Transfer Learning for Low Resource Emotional TTS	Jan 14, 2019	Deep LearningEmotional Speech Synthesis	CodeCode Available	1	5
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data	Apr 4, 2019	Speech Synthesistext-to-speech	CodeCode Available	1	5
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning	Nov 7, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1	5
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder	Nov 7, 2022	Speech Synthesistext-to-speech	CodeCode Available	1	5
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	May 22, 2020	text-to-speechText to Speech	CodeCode Available	1	5
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	Jun 12, 2018	Speaker VerificationSpeech Synthesis	CodeCode Available	0	5
Tools and resources for Romanian text-to-speech and speech-to-text applications	Feb 15, 2018	speech-recognitionSpeech Recognition	CodeCode Available	0	5
Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis	Oct 9, 2021	Lifelong learningSpeech Synthesis	CodeCode Available	0	5
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale	Jun 23, 2023	In-Context LearningSpeech Synthesis	CodeCode Available	0	5
Systematic Inequalities in Language Technology Performance across the World’s Languages	May 1, 2022	Dependency ParsingMachine Translation	CodeCode Available	0	5
Systematic Inequalities in Language Technology Performance across the World's Languages	Oct 13, 2021	Dependency ParsingMachine Translation	CodeCode Available	0	5
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis	Feb 28, 2020	Speech Synthesistext-to-speech	CodeCode Available	0	5
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems	Jun 25, 2018	Speech Emotion RecognitionSpeech Synthesis	CodeCode Available	0	5
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis	Oct 29, 2019	Speaker VerificationSpeech Synthesis	CodeCode Available	0	5
Speech Synthesis from Text and Ultrasound Tongue Image-based Articulatory Input	Jul 5, 2021	Speech Synthesistext-to-speech	CodeCode Available	0	5
Attentive Multi-Layer Perceptron for Non-autoregressive Generation	Oct 14, 2023	Machine TranslationSpeech Synthesis	CodeCode Available	0	5
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available	0	5
Non-Autoregressive Neural Text-to-Speech	May 21, 2019	text-to-speechText to Speech	CodeCode Available	0	5
Multimodal Latent Language Modeling with Next-Token Diffusion	Dec 11, 2024	Image GenerationLanguage Modeling	CodeCode Available	0	5
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting	Feb 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	0	5
MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response	Oct 8, 2020	Deep LearningPrognosis	CodeCode Available	0	5
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems	May 21, 2019	parameter estimationSpeech Synthesis	CodeCode Available	0	5
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers	Sep 5, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	0	5
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis	Mar 20, 2022	Speaker VerificationSpeech Synthesis	CodeCode Available	0	5
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors	Oct 25, 2023	en-US domain classificationen-US Intent Classification	CodeCode Available	0	5
MelNet: A Generative Model for Audio in the Frequency Domain	Jun 4, 2019	Audio GenerationMusic Generation	CodeCode Available	0	5
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis	Apr 26, 2021	Language ModelingLanguage Modelling	CodeCode Available	0	5
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language	Oct 29, 2018	Speech Synthesistext-to-speech	CodeCode Available	0	5
Direct speech-to-speech translation with a sequence-to-sequence model	Apr 12, 2019	Speech SynthesisSpeech-to-Speech Translation	CodeCode Available	0	5

Show:10 25 50

← PrevPage 2 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified