Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 426–450 of 1249 papers

Title	Date	Tasks	Status
Ensemble prosody prediction for expressive speech synthesis	Apr 3, 2023	DiversityEnsemble Learning	—Unverified
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback	Jun 2, 2024	Speech Synthesistext-to-speech	—Unverified
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data	Mar 2, 2018	Generative Adversarial NetworkSpeech Enhancement	—Unverified
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?	Jun 11, 2024	Contrastive LearningSpeech Synthesis	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data	Sep 17, 2024	Speech Synthesis	—Unverified
Evaluating Speech-in-Speech Perception via a Humanoid Robot	Dec 19, 2023	Speech Synthesis	—Unverified
Enhancing audio quality for expressive Neural Text-to-Speech	Aug 13, 2021	Acoustic ModellingSpeech Synthesis	—Unverified
A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data	Nov 1, 2022	Machine TranslationSpeech Synthesis	—Unverified
\'Evaluation objective de plongements pour la synth\`ese de parole guid\'ee par r\'eseaux de neurones (Objective evaluation of embeddings for speech synthesis guided by neural networks)	Jul 1, 2019	Speech Synthesis	—Unverified
Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers	Sep 1, 2012	Speech Synthesis	—Unverified
\'Evaluation segmentale du syst\`eme de synth\`ese HTS pour le fran (Segmental evaluation of HTS) [in French]	Jun 1, 2012	Speech Synthesis	—Unverified
Everyday Speech in the Indian Subcontinent	Oct 14, 2024	Speech Synthesis	—Unverified
A Flow-Based Neural Network for Time Domain Speech Enhancement	Jun 16, 2021	Density EstimationSpeech Enhancement	—Unverified
ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems	Nov 9, 2018	Speech Synthesis	—Unverified
A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems	May 26, 2020	GPUSpeech Synthesis	—Unverified
A Bengali Speech Synthesizer on Android OS	Jul 1, 2012	Speech Synthesis	—Unverified
Energy-Based Models For Speech Synthesis	Oct 19, 2023	Speech Synthesis	—Unverified
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors	Jul 18, 2021	Speech Synthesis	—Unverified
Can large-scale vocoded spoofed data improve speech spoofing countermeasure with a self-supervised front end?	Sep 12, 2023	Self-Supervised LearningSpeech Synthesis	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks	Apr 27, 2021	Lip ReadingSpeech Synthesis	—Unverified
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder	Apr 6, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE	Oct 19, 2020	Speech Synthesistext-to-speech	—Unverified
A Preliminary Study on Deep Learning-based Chinese Text to Taiwanese Speech Synthesis System	Sep 1, 2020	Speech Synthesis	—Unverified

Show:10 25 50

← PrevPage 18 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified