Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 332 papers

Title	Date	Tasks	Status	Hype
Neural Models of Text Normalization for Speech Applications	Jun 1, 2019	BIG-bench Machine LearningSpeech Synthesis	—Unverified	0
Neural Text Normalization with Subword Units	Jun 1, 2019	Machine TranslationNatural Language Understanding	—Unverified	0
FastSpeech: Fast,Robustand Controllable Text-to-Speech	May 22, 2019	Decodertext-to-speech	CodeCode Available	2
FastSpeech: Fast, Robust and Controllable Text to Speech	May 22, 2019	DecoderSpeech Synthesis	CodeCode Available	2
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems	May 21, 2019	parameter estimationSpeech Synthesis	CodeCode Available	0
Non-Autoregressive Neural Text-to-Speech	May 21, 2019	text-to-speechText to Speech	CodeCode Available	0
Direct speech-to-speech translation with a sequence-to-sequence model	Apr 12, 2019	Speech SynthesisSpeech-to-Speech Translation	CodeCode Available	0
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion	Apr 6, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data	Apr 4, 2019	Speech Synthesistext-to-speech	CodeCode Available	1
Speech denoising by parametric resynthesis	Apr 2, 2019	DenoisingResynthesis	—Unverified	0
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis	Mar 27, 2019	Emotional Speech SynthesisExpressive Speech Synthesis	CodeCode Available	1
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis	Mar 14, 2019	Generative Adversarial NetworkSpeech Synthesis	—Unverified	0
Exploring Transfer Learning for Low Resource Emotional TTS	Jan 14, 2019	Deep LearningEmotional Speech Synthesis	CodeCode Available	1
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms	Nov 9, 2018	GPUImage Captioning	—Unverified	0
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks	Oct 30, 2018	Image GenerationSpeech Synthesis	—Unverified	0
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention	Oct 29, 2018	Speech Synthesistext-to-speech	—Unverified	0
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language	Oct 29, 2018	Speech Synthesistext-to-speech	CodeCode Available	0
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified	0
Neural Speech Synthesis with Transformer Network	Sep 19, 2018	DecoderMachine Translation	CodeCode Available	2
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	Aug 4, 2018	Speech Synthesistext-to-speech	—Unverified	0
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder	Jul 31, 2018	Generative Adversarial NetworkSpeech Synthesis	—Unverified	0
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems	Jun 25, 2018	Speech Emotion RecognitionSpeech Synthesis	CodeCode Available	0
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	Jun 12, 2018	Speaker VerificationSpeech Synthesis	CodeCode Available	0
Design and Development of Speech Corpora for Air Traffic Control Training	May 1, 2018	Automatic Speech Recognition (ASR)Speech Recognition	—Unverified	0
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.	May 1, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
Improving homograph disambiguation with supervised machine learning	May 1, 2018	BIG-bench Machine LearningSpeech Synthesis	—Unverified	0
Speaker-independent raw waveform model for glottal excitation	Apr 25, 2018	modelSpeech Synthesis	—Unverified	0
Machine Speech Chain with One-shot Speaker Adaptation	Mar 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis	Mar 23, 2018	Speech SynthesisStyle Transfer	CodeCode Available	1
Efficient Neural Audio Synthesis	Feb 23, 2018	Audio SynthesisCPU	CodeCode Available	2
Tools and resources for Romanian text-to-speech and speech-to-text applications	Feb 15, 2018	speech-recognitionSpeech Recognition	CodeCode Available	0
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform	Dec 13, 2017	Speech Synthesistext-to-speech	—Unverified	0
Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention	Oct 24, 2017	text-to-speechText to Speech	CodeCode Available	1
Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images	Sep 1, 2017	Referring ExpressionReferring expression generation	—Unverified	0
Listening while Speaking: Speech Chain by Deep Learning	Jul 16, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface	Apr 1, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Tacotron: Towards End-to-End Speech Synthesis	Mar 29, 2017	Audio SynthesisSpeech Synthesis	CodeCode Available	1
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified	0
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified	0
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance	May 1, 2016	Speech Synthesistext-to-speech	—Unverified	0
Minimally Supervised Number Normalization	Jan 1, 2016	speech-recognitionSpeech Recognition	—Unverified	0
Text Normalization and Unit Selection for a Memory Based Non Uniform Unit Selection TTS in Malayalam	Dec 1, 2015	Speech SynthesisText Normalization	—Unverified	0
Hierarchical Representation of Prosody for Statistical Speech Synthesis	Oct 7, 2015	Speech Synthesistext-to-speech	—Unverified	0
Individuality-Preserving Spectrum Modification for Articulation Disorders Using Phone Selective Synthesis	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified	0
Which Synthetic Voice Should I Choose for an Evocative Task?	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified	0
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified	0
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies	Jul 1, 2015	Coreference ResolutionNamed Entity Recognition (NER)	—Unverified	0
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified	0
Normalization of Non-Standard Words in Croatian Texts	Mar 27, 2015	FormGeneral Classification	—Unverified	0

Show:10 25 50

← PrevPage 6 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified