Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 171–180 of 332 papers

Title	Date	Tasks	Status	Hype
Towards Lifelong Learning of Multilingual Text-To-Speech Synthesis	Oct 9, 2021	Lifelong learningSpeech Synthesis	CodeCode Available	0
Environment Aware Text-to-Speech Synthesis	Oct 8, 2021	AttributeDisentanglement	—Unverified	0
EdiTTS: Score-based Editing for Controllable Text-to-Speech	Oct 6, 2021	Speech SynthesisSpeech-to-Text	CodeCode Available	1
Prosody-TTS: An end-to-end speech synthesis system with prosody control	Oct 6, 2021	RhythmSpeech Synthesis	—Unverified	0
Neural Speech Synthesis in German	Oct 3, 2021	Speech Synthesistext-to-speech	—Unverified	0
PortaSpeech: Portable and High-Quality Generative Text-to-Speech	Sep 30, 2021	text-to-speechText to Speech	CodeCode Available	2
Conditioning Sequence-to-sequence Networks with Learned Activations	Sep 29, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Guided-TTS:Text-to-Speech with Untranscribed Speech	Sep 29, 2021	Speech Synthesistext-to-speech	—Unverified	0
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network	Sep 22, 2021	Knowledge DistillationLanguage Modeling	—Unverified	0
A Unified Transformer-based Framework for Duplex Text Normalization	Aug 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 18 of 34Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified