Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 1249 papers

Title	Date	Tasks	Status
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization	Oct 30, 2018	Data AugmentationDisentanglement	—Unverified
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS	Oct 9, 2024	DiversitySpeech Synthesis	—Unverified
Evaluating expressive speech synthesis from audiobook corpora for conversational phrases	May 1, 2012	ClusteringExpressive Speech Synthesis	—Unverified
BAD: An Assistant tool for making verses in Basque	Apr 1, 2012	Speech SynthesisText-To-Speech Synthesis	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention	Oct 15, 2021	Simultaneous Speech-to-Speech TranslationSpeech Synthesis	—Unverified
Evaluating Speech-in-Speech Perception via a Humanoid Robot	Dec 19, 2023	Speech Synthesis	—Unverified
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model	May 16, 2024	HallucinationLanguage Modeling	—Unverified
\'Evaluation objective de plongements pour la synth\`ese de parole guid\'ee par r\'eseaux de neurones (Objective evaluation of embeddings for speech synthesis guided by neural networks)	Jul 1, 2019	Speech Synthesis	—Unverified
Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers	Sep 1, 2012	Speech Synthesis	—Unverified
\'Evaluation segmentale du syst\`eme de synth\`ese HTS pour le fran (Segmental evaluation of HTS) [in French]	Jun 1, 2012	Speech Synthesis	—Unverified
Everyday Speech in the Indian Subcontinent	Oct 14, 2024	Speech Synthesis	—Unverified
Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters	Jun 19, 2021	Speech Synthesistext-to-speech	—Unverified
ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems	Nov 9, 2018	Speech Synthesis	—Unverified
A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis	Oct 6, 2015	Speech SynthesisVocal Bursts Intensity Prediction	—Unverified
Exploring Speech Enhancement for Low-resource Speech Synthesis	Sep 19, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech	Jan 13, 2025	Speech Synthesis	—Unverified
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors	Jul 18, 2021	Speech Synthesis	—Unverified
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions	Feb 18, 2025	Speech Synthesis	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification	Apr 19, 2020	Speaker VerificationSpeech Synthesis	—Unverified
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder	Apr 6, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Accurate synthesis of Dysarthric Speech for ASR data augmentation	Aug 16, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis	Jun 15, 2023	DenoisingSpeech Synthesis	—Unverified

Show:10 25 50

← PrevPage 18 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified