Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 226–250 of 332 papers

Title	Date	Tasks	Status	Hype
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes	Aug 7, 2020	Gaussian ProcessesSpeech Synthesis	—Unverified	0
Normalizing Text using Language Modelling based on Phonetics and String Similarity	Jun 25, 2020	Language ModelingLanguage Modelling	—Unverified	0
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech	Jun 8, 2020	Knowledge DistillationSpeech Synthesis	CodeCode Available	1
End-to-End Adversarial Text-to-Speech	Jun 5, 2020	Adversarial TextDynamic Time Warping	CodeCode Available	1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	May 22, 2020	text-to-speechText to Speech	CodeCode Available	1
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis	May 20, 2020	Speech Synthesistext-to-speech	—Unverified	0
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation	May 16, 2020	DecoderSpeech Synthesis	—Unverified	0
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis	May 12, 2020	Speech SynthesisStyle Transfer	CodeCode Available	1
Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan	May 1, 2020	Speech Synthesistext-to-speech	—Unverified	0
Style Variation as a Vantage Point for Code-Switching	May 1, 2020	Language ModelingLanguage Modelling	—Unverified	0
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis	Feb 28, 2020	Speech Synthesistext-to-speech	CodeCode Available	0
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech	Nov 28, 2019	DisentanglementExpressive Speech Synthesis	—Unverified	0
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers	Nov 26, 2019	Speech Synthesistext-to-speech	—Unverified	0
Independent and automatic evaluation of acoustic-to-articulatory inversion models	Nov 15, 2019	speech-recognitionSpeech Recognition	CodeCode Available	0
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified	0
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework	Nov 7, 2019	SentenceSpeech Synthesis	—Unverified	0
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis	Oct 29, 2019	Speaker VerificationSpeech Synthesis	CodeCode Available	0
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified	0
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram	Oct 25, 2019	Generative Adversarial NetworkGPU	CodeCode Available	2
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach	Oct 14, 2019	Expressive Speech SynthesisSociology	—Unverified	0
Modular Meta-Learning with Shrinkage	Sep 12, 2019	Image ClassificationMeta-Learning	—Unverified	0
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified	0
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis	Aug 27, 2019	Speech Synthesistext-to-speech	—Unverified	0
MelNet: A Generative Model for Audio in the Frequency Domain	Jun 4, 2019	Audio GenerationMusic Generation	CodeCode Available	0
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 10 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified