Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 332 papers

Title	Date	Tasks	Status
Parallel WaveNet conditioned on VAE latent vectors	Dec 17, 2020	SentenceSpeech Synthesis	—Unverified
Using previous acoustic context to improve Text-to-Speech synthesis	Dec 7, 2020	DecoderSpeech Synthesis	—Unverified
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS	Nov 10, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement	Nov 8, 2020	DisentanglementSpeech Synthesis	—Unverified
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis	Oct 23, 2020	Graph AttentionGraph Neural Network	—Unverified
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems	Oct 21, 2020	Grapheme-to-Phoneme ConversionRelation	—Unverified
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE	Oct 19, 2020	Speech Synthesistext-to-speech	—Unverified
MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response	Oct 8, 2020	Deep LearningPrognosis	CodeCode Available
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey	Sep 26, 2020	Dialect IdentificationMachine Translation	—Unverified
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis	Sep 17, 2020	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Controllable neural text-to-speech synthesis using intuitive prosodic features	Sep 14, 2020	SentenceSpeech Synthesis	—Unverified
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS	Sep 4, 2020	DecoderSentence	—Unverified
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer	Sep 3, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes	Aug 7, 2020	Gaussian ProcessesSpeech Synthesis	—Unverified
Normalizing Text using Language Modelling based on Phonetics and String Similarity	Jun 25, 2020	Language ModelingLanguage Modelling	—Unverified
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis	May 20, 2020	Speech Synthesistext-to-speech	—Unverified
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation	May 16, 2020	DecoderSpeech Synthesis	—Unverified
Style Variation as a Vantage Point for Code-Switching	May 1, 2020	Language ModelingLanguage Modelling	—Unverified
Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan	May 1, 2020	Speech Synthesistext-to-speech	—Unverified
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis	Feb 28, 2020	Speech Synthesistext-to-speech	CodeCode Available
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech	Nov 28, 2019	DisentanglementExpressive Speech Synthesis	—Unverified
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers	Nov 26, 2019	Speech Synthesistext-to-speech	—Unverified
Independent and automatic evaluation of acoustic-to-articulatory inversion models	Nov 15, 2019	speech-recognitionSpeech Recognition	CodeCode Available

Show:10 25 50

← PrevPage 10 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified