Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 332 papers

Title	Date	Tasks	Status
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era	Oct 6, 2022	Speech Synthesistext-to-speech	—Unverified
Controllable Accented Text-to-Speech Synthesis	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers	Sep 5, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model	Jul 4, 2022	Language ModelingLanguage Modelling	—Unverified
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS	Jun 30, 2022	DecoderGPU	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence	May 9, 2022	Speech Synthesistext-to-speech	—Unverified
Systematic Inequalities in Language Technology Performance across the World’s Languages	May 1, 2022	Dependency ParsingMachine Translation	CodeCode Available
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance	Apr 11, 2022	Speaker VerificationSpeech Synthesis	—Unverified
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis	Apr 6, 2022	Speech Synthesistext-to-speech	—Unverified
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature	Apr 2, 2022	Speech Synthesistext-to-speech	—Unverified
Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis	Mar 29, 2022	Speech Synthesistext-to-speech	—Unverified
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling	Mar 21, 2022	DecoderSpeech Synthesis	—Unverified
ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis	Mar 20, 2022	Speaker VerificationSpeech Synthesis	CodeCode Available
Text-free non-parallel many-to-many voice conversion using normalising flows	Mar 15, 2022	Normalising FlowsSpeech Synthesis	—Unverified
Deep Performer: Score-to-Audio Music Performance Synthesis	Feb 12, 2022	DecoderSpeech Synthesis	—Unverified
Multi-Stage Deep Transfer Learning for EmIoT-enabled Human-Computer Interaction	Feb 3, 2022	Human-Object Interaction Detectiontext-to-speech	—Unverified
Transformer-based Models of Text Normalization for Speech Applications	Feb 1, 2022	SentenceSpeech Synthesis	—Unverified
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios	Dec 23, 2021	DiversitySpeech Synthesis	—Unverified
Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance	Nov 23, 2021	speech-recognitionSpeech Recognition	—Unverified
Systematic Inequalities in Language Technology Performance across the World's Languages	Oct 13, 2021	Dependency ParsingMachine Translation	CodeCode Available

Show:10 25 50

← PrevPage 8 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified