Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 332 papers

Title	Date	Tasks	Status
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework	Nov 7, 2019	SentenceSpeech Synthesis	—Unverified
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis	Oct 29, 2019	Speaker VerificationSpeech Synthesis	CodeCode Available
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach	Oct 14, 2019	Expressive Speech SynthesisSociology	—Unverified
Modular Meta-Learning with Shrinkage	Sep 12, 2019	Image ClassificationMeta-Learning	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis	Aug 27, 2019	Speech Synthesistext-to-speech	—Unverified
MelNet: A Generative Model for Audio in the Frequency Domain	Jun 4, 2019	Audio GenerationMusic Generation	CodeCode Available
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Neural Text Normalization with Subword Units	Jun 1, 2019	Machine TranslationNatural Language Understanding	—Unverified
Neural Models of Text Normalization for Speech Applications	Jun 1, 2019	BIG-bench Machine LearningSpeech Synthesis	—Unverified
Non-Autoregressive Neural Text-to-Speech	May 21, 2019	text-to-speechText to Speech	CodeCode Available
Effective parameter estimation methods for an ExcitNet model in generative text-to-speech systems	May 21, 2019	parameter estimationSpeech Synthesis	CodeCode Available
Direct speech-to-speech translation with a sequence-to-sequence model	Apr 12, 2019	Speech SynthesisSpeech-to-Speech Translation	CodeCode Available
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion	Apr 6, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Speech denoising by parametric resynthesis	Apr 2, 2019	DenoisingResynthesis	—Unverified
Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis	Mar 14, 2019	Generative Adversarial NetworkSpeech Synthesis	—Unverified
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms	Nov 9, 2018	GPUImage Captioning	—Unverified
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Waveform generation for text-to-speech synthesis using pitch-synchronous multi-scale generative adversarial networks	Oct 30, 2018	Image GenerationSpeech Synthesis	—Unverified
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention	Oct 29, 2018	Speech Synthesistext-to-speech	—Unverified
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language	Oct 29, 2018	Speech Synthesistext-to-speech	CodeCode Available
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	Aug 4, 2018	Speech Synthesistext-to-speech	—Unverified
Wasserstein GAN and Waveform Loss-based Acoustic Model Training for Multi-speaker Text-to-Speech Synthesis Systems Using a WaveNet Vocoder	Jul 31, 2018	Generative Adversarial NetworkSpeech Synthesis	—Unverified
The Emotional Voices Database: Towards Controlling the Emotion Dimension in Voice Generation Systems	Jun 25, 2018	Speech Emotion RecognitionSpeech Synthesis	CodeCode Available
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	Jun 12, 2018	Speaker VerificationSpeech Synthesis	CodeCode Available
Design and Development of Speech Corpora for Air Traffic Control Training	May 1, 2018	Automatic Speech Recognition (ASR)Speech Recognition	—Unverified
Improving homograph disambiguation with supervised machine learning	May 1, 2018	BIG-bench Machine LearningSpeech Synthesis	—Unverified
SynPaFlex-Corpus: An Expressive French Audiobooks Corpus dedicated to expressive speech synthesis.	May 1, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Speaker-independent raw waveform model for glottal excitation	Apr 25, 2018	modelSpeech Synthesis	—Unverified
Machine Speech Chain with One-shot Speaker Adaptation	Mar 28, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Tools and resources for Romanian text-to-speech and speech-to-text applications	Feb 15, 2018	speech-recognitionSpeech Recognition	CodeCode Available
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform	Dec 13, 2017	Speech Synthesistext-to-speech	—Unverified
Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images	Sep 1, 2017	Referring ExpressionReferring expression generation	—Unverified
Listening while Speaking: Speech Chain by Deep Learning	Jul 16, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
CASSANDRA: A multipurpose configurable voice-enabled human-computer-interface	Apr 1, 2017	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance	May 1, 2016	Speech Synthesistext-to-speech	—Unverified
Minimally Supervised Number Normalization	Jan 1, 2016	speech-recognitionSpeech Recognition	—Unverified
Text Normalization and Unit Selection for a Memory Based Non Uniform Unit Selection TTS in Malayalam	Dec 1, 2015	Speech SynthesisText Normalization	—Unverified
Hierarchical Representation of Prosody for Statistical Speech Synthesis	Oct 7, 2015	Speech Synthesistext-to-speech	—Unverified
Which Synthetic Voice Should I Choose for an Evocative Task?	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified
Individuality-Preserving Spectrum Modification for Articulation Disorders Using Phone Selective Synthesis	Sep 1, 2015	Speech SynthesisText-To-Speech Synthesis	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies	Jul 1, 2015	Coreference ResolutionNamed Entity Recognition (NER)	—Unverified
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified
Normalization of Non-Standard Words in Croatian Texts	Mar 27, 2015	FormGeneral Classification	—Unverified

Show:10 25 50

← PrevPage 6 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified