Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–275 of 332 papers

Title	Date	Tasks	Status
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework	Nov 7, 2019	SentenceSpeech Synthesis	—Unverified
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis	Oct 29, 2019	Speaker VerificationSpeech Synthesis	CodeCode Available
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach	Oct 14, 2019	Expressive Speech SynthesisSociology	—Unverified
Modular Meta-Learning with Shrinkage	Sep 12, 2019	Image ClassificationMeta-Learning	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis	Aug 27, 2019	Speech Synthesistext-to-speech	—Unverified
MelNet: A Generative Model for Audio in the Frequency Domain	Jun 4, 2019	Audio GenerationMusic Generation	CodeCode Available
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Neural Text Normalization with Subword Units	Jun 1, 2019	Machine TranslationNatural Language Understanding	—Unverified
Neural Models of Text Normalization for Speech Applications	Jun 1, 2019	BIG-bench Machine LearningSpeech Synthesis	—Unverified
Non-Autoregressive Neural Text-to-Speech	May 21, 2019	text-to-speechText to Speech	CodeCode Available
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems	May 21, 2019	parameter estimationSpeech Synthesis	CodeCode Available
Direct speech-to-speech translation with a sequence-to-sequence model	Apr 12, 2019	Speech SynthesisSpeech-to-Speech Translation	CodeCode Available
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion	Apr 6, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Speech denoising by parametric resynthesis	Apr 2, 2019	DenoisingResynthesis	—Unverified
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis	Mar 14, 2019	Generative Adversarial NetworkSpeech Synthesis	—Unverified
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms	Nov 9, 2018	GPUImage Captioning	—Unverified
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks	Oct 30, 2018	Image GenerationSpeech Synthesis	—Unverified
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention	Oct 29, 2018	Speech Synthesistext-to-speech	—Unverified
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language	Oct 29, 2018	Speech Synthesistext-to-speech	CodeCode Available
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	Aug 4, 2018	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 11 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified