Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 332 papers

Title	Date	Tasks	Status	Hype
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting	May 19, 2023	Speech Synthesistext-to-speech	—Unverified	0
A unified front-end framework for English text-to-speech synthesis	May 18, 2023	Speech SynthesisText Normalization	—Unverified	0
Accented Text-to-Speech Synthesis with Limited Data	May 8, 2023	Speech Synthesistext-to-speech	—Unverified	0
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis	May 3, 2023	Speech Synthesistext-to-speech	—Unverified	0
A Review of Deep Learning Techniques for Speech Processing	Apr 30, 2023	Automatic Speech RecognitionDeep Learning	—Unverified	0
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model	Apr 24, 2023	RhythmSelf-Supervised Learning	—Unverified	0
Enhancing Suno's Bark Text-to-Speech Model: Addressing Limitations Through Meta's Encodec and Pre-Trained Hubert	Apr 18, 2023	Audio GenerationExpressive Speech Synthesis	CodeCode Available	4
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis	Mar 27, 2023	AllAutomatic Speech Recognition	—Unverified	0
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI	Mar 23, 2023	Speech EnhancementSpeech Synthesis	—Unverified	0
Controllable Prosody Generation With Partial Inputs	Mar 14, 2023	Speech Synthesistext-to-speech	—Unverified	0
Do Prosody Transfer Models Transfer Prosody?	Mar 7, 2023	Speech Synthesistext-to-speech	—Unverified	0
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling	Mar 7, 2023	In-Context LearningLanguage Modeling	CodeCode Available	5
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations	Mar 1, 2023	Self-Supervised LearningSpeech Synthesis	—Unverified	0
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech	Feb 27, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech	Feb 8, 2023	Code GenerationDiversity	CodeCode Available	2
UzbekTagger: The rule-based POS tagger for Uzbek language	Jan 30, 2023	Language ModelingLanguage Modelling	—Unverified	0
Applying Automated Machine Translation to Educational Video Courses	Jan 9, 2023	Machine TranslationSpeech Synthesis	—Unverified	0
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers	Jan 5, 2023	In-Context LearningLanguage Modeling	CodeCode Available	7
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration	Jan 1, 2023	Audio-Visual Speech RecognitionResynthesis	—Unverified	0
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement	Dec 21, 2022	Audio-Visual Speech RecognitionResynthesis	—Unverified	0
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder	Dec 16, 2022	Representation LearningSpeech Synthesis	—Unverified	0
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language	Dec 16, 2022	Language ModelingLanguage Modelling	—Unverified	0
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis	Dec 15, 2022	RelationSpeech Synthesis	CodeCode Available	1
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset	Dec 11, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
Towards Building Text-To-Speech Systems for the Next Billion Users	Nov 17, 2022	DiversitySpeech Synthesis	CodeCode Available	2
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models	Nov 17, 2022	Speech Synthesistext-to-speech	—Unverified	0
OverFlow: Putting flows on top of neural transducers for better TTS	Nov 13, 2022	Normalising FlowsSpeech Synthesis	CodeCode Available	1
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	Nov 7, 2022	Representation LearningSpeech Representation Learning	CodeCode Available	6
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder	Nov 7, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages	Nov 1, 2022	ChunkingRhythm	—Unverified	0
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech	Oct 27, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era	Oct 6, 2022	Speech Synthesistext-to-speech	—Unverified	0
Controllable Accented Text-to-Speech Synthesis	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified	0
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline	Sep 22, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified	0
Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers	Sep 5, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	0
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech	Jul 13, 2022	DenoisingGPU	CodeCode Available	3
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model	Jul 4, 2022	Language ModelingLanguage Modelling	—Unverified	0
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS	Jun 30, 2022	DecoderGPU	—Unverified	0
Automatic Prosody Annotation with Pre-Trained Text-Speech Model	Jun 16, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available	0
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis	May 30, 2022	Data AugmentationSelf-Supervised Learning	CodeCode Available	2
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech	May 15, 2022	Speech SynthesisStyle Transfer	CodeCode Available	2
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence	May 9, 2022	Speech Synthesistext-to-speech	—Unverified	0
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	May 9, 2022	SentenceSpeech Synthesis	CodeCode Available	2
Systematic Inequalities in Language Technology Performance across the World’s Languages	May 1, 2022	Dependency ParsingMachine Translation	CodeCode Available	0

Show:10 25 50

← PrevPage 3 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified