Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–250 of 332 papers

Title	Date	Tasks	Status	Hype
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input	Feb 19, 2021	Language ModelingLanguage Modelling	—Unverified	0
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention	Feb 12, 2021	Speech Synthesistext-to-speech	—Unverified	0
Voice Cloning: a Multi-Speaker Text-to-Speech Synthesis Approach based on Transfer Learning	Feb 10, 2021	Speech Synthesistext-to-speech	—Unverified	0
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet	Jan 30, 2021	CPUSentence	—Unverified	0
Parallel WaveNet conditioned on VAE latent vectors	Dec 17, 2020	SentenceSpeech Synthesis	—Unverified	0
Using previous acoustic context to improve Text-to-Speech synthesis	Dec 7, 2020	DecoderSpeech Synthesis	—Unverified	0
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities	Dec 1, 2020	Chinese Word SegmentationSpeech Synthesis	CodeCode Available	1
Simultaneous Speech-to-Speech Translation System with Neural Incremental ASR, MT, and TTS	Nov 10, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement	Nov 8, 2020	DisentanglementSpeech Synthesis	—Unverified	0
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis	Nov 6, 2020	DecoderSpeech Synthesis	CodeCode Available	1
Semi-supervised URL Segmentation with Recurrent Neural NetworksPre-trained on Knowledge Graph Entities	Nov 5, 2020	Chinese Word SegmentationSpeech Synthesis	CodeCode Available	1
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Effective Deep Learning Models for Automatic Diacritization of Arabic Text	Nov 1, 2020	Arabic Text DiacritizationDecoder	CodeCode Available	1
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis	Oct 23, 2020	Graph AttentionGraph Neural Network	—Unverified	0
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems	Oct 21, 2020	Grapheme-to-Phoneme ConversionRelation	—Unverified	0
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE	Oct 19, 2020	Speech Synthesistext-to-speech	—Unverified	0
MIA-Prognosis: A Deep Learning Framework to Predict Therapy Response	Oct 8, 2020	Deep LearningPrognosis	CodeCode Available	0
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey	Sep 26, 2020	Dialect IdentificationMachine Translation	—Unverified	0
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis	Sep 17, 2020	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
Controllable neural text-to-speech synthesis using intuitive prosodic features	Sep 14, 2020	SentenceSpeech Synthesis	—Unverified	0
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS	Sep 4, 2020	DecoderSentence	—Unverified	0
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer	Sep 3, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
WaveGrad: Estimating Gradients for Waveform Generation	Sep 2, 2020	Speech SynthesisText-To-Speech Synthesis	CodeCode Available	1
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion	Aug 13, 2020	Speech Synthesistext-to-speech	CodeCode Available	1
Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes	Aug 7, 2020	Gaussian ProcessesSpeech Synthesis	—Unverified	0
Normalizing Text using Language Modelling based on Phonetics and String Similarity	Jun 25, 2020	Language ModelingLanguage Modelling	—Unverified	0
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech	Jun 8, 2020	Knowledge DistillationSpeech Synthesis	CodeCode Available	1
End-to-End Adversarial Text-to-Speech	Jun 5, 2020	Adversarial TextDynamic Time Warping	CodeCode Available	1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	May 22, 2020	text-to-speechText to Speech	CodeCode Available	1
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis	May 20, 2020	Speech Synthesistext-to-speech	—Unverified	0
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation	May 16, 2020	DecoderSpeech Synthesis	—Unverified	0
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis	May 12, 2020	Speech SynthesisStyle Transfer	CodeCode Available	1
Neural Text-to-Speech Synthesis for an Under-Resourced Language in a Diglossic Environment: the Case of Gascon Occitan	May 1, 2020	Speech Synthesistext-to-speech	—Unverified	0
Style Variation as a Vantage Point for Code-Switching	May 1, 2020	Language ModelingLanguage Modelling	—Unverified	0
Comparison of Speech Representations for Automatic Quality Estimation in Multi-Speaker Text-to-Speech Synthesis	Feb 28, 2020	Speech Synthesistext-to-speech	CodeCode Available	0
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech	Nov 28, 2019	DisentanglementExpressive Speech Synthesis	—Unverified	0
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers	Nov 26, 2019	Speech Synthesistext-to-speech	—Unverified	0
Independent and automatic evaluation of acoustic-to-articulatory inversion models	Nov 15, 2019	speech-recognitionSpeech Recognition	CodeCode Available	0
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified	0
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework	Nov 7, 2019	SentenceSpeech Synthesis	—Unverified	0
Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis	Oct 29, 2019	Speaker VerificationSpeech Synthesis	CodeCode Available	0
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified	0
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram	Oct 25, 2019	Generative Adversarial NetworkGPU	CodeCode Available	2
The Theory behind Controllable Expressive Speech Synthesis: a Cross-disciplinary Approach	Oct 14, 2019	Expressive Speech SynthesisSociology	—Unverified	0
Modular Meta-Learning with Shrinkage	Sep 12, 2019	Image ClassificationMeta-Learning	—Unverified	0
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified	0
Neural Harmonic-plus-Noise Waveform Model with Trainable Maximum Voice Frequency for Text-to-Speech Synthesis	Aug 27, 2019	Speech Synthesistext-to-speech	—Unverified	0
MelNet: A Generative Model for Audio in the Frequency Domain	Jun 4, 2019	Audio GenerationMusic Generation	CodeCode Available	0
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 5 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified