Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 332 papers

Title	Date	Tasks	Status
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder	Jul 31, 2018	Generative Adversarial NetworkSpeech Synthesis	—Unverified
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems	Jun 25, 2018	Speech Emotion RecognitionSpeech Synthesis	CodeCode Available
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	Jun 12, 2018	Speaker VerificationSpeech Synthesis	CodeCode Available
Design and Development of Speech Corpora for Air Traffic Control Training	May 1, 2018	Automatic Speech Recognition (ASR)Speech Recognition	—Unverified
Improving homograph disambiguation with supervised machine learning	May 1, 2018	BIG-bench Machine LearningSpeech Synthesis	—Unverified
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.	May 1, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Speaker-independent raw waveform model for glottal excitation	Apr 25, 2018	modelSpeech Synthesis	—Unverified
Machine Speech Chain with One-shot Speaker Adaptation	Mar 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Tools and resources for Romanian text-to-speech and speech-to-text applications	Feb 15, 2018	speech-recognitionSpeech Recognition	CodeCode Available
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform	Dec 13, 2017	Speech Synthesistext-to-speech	—Unverified
Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images	Sep 1, 2017	Referring ExpressionReferring expression generation	—Unverified
Listening while Speaking: Speech Chain by Deep Learning	Jul 16, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface	Apr 1, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance	May 1, 2016	Speech Synthesistext-to-speech	—Unverified
Minimally Supervised Number Normalization	Jan 1, 2016	speech-recognitionSpeech Recognition	—Unverified
Text Normalization and Unit Selection for a Memory Based Non Uniform Unit Selection TTS in Malayalam	Dec 1, 2015	Speech SynthesisText Normalization	—Unverified
Hierarchical Representation of Prosody for Statistical Speech Synthesis	Oct 7, 2015	Speech Synthesistext-to-speech	—Unverified
Which Synthetic Voice Should I Choose for an Evocative Task?	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified
Individuality-Preserving Spectrum Modification for Articulation Disorders Using Phone Selective Synthesis	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies	Jul 1, 2015	Coreference ResolutionNamed Entity Recognition (NER)	—Unverified
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified
Normalization of Non-Standard Words in Croatian Texts	Mar 27, 2015	FormGeneral Classification	—Unverified

Show:10 25 50

← PrevPage 12 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified