Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1151–1175 of 1249 papers

Title	Date	Tasks	Status
Design and Development of Speech Corpora for Air Traffic Control Training	May 1, 2018	Automatic Speech Recognition (ASR)Speech Recognition	—Unverified
Designing French Tale Corpora for Entertaining Text To Speech Synthesis	May 1, 2012	SentenceSpeech Synthesis	—Unverified
Designing Language Technology Applications: A Wizard of Oz Driven Prototyping Framework	Apr 1, 2014	Machine TranslationSpeech Recognition	—Unverified
Designing the Latvian Speech Recognition Corpus	May 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired	Nov 28, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Development and Evaluation of Speech Synthesis Corpora for Latvian	May 1, 2020	speech-recognitionSpeech Recognition	—Unverified
Development of Mandarin-English code-switching speech synthesis system	Nov 1, 2022	SentenceSpeech Synthesis	—Unverified
Development of Marathi Part of Speech Tagger Using Statistical Approach	Oct 2, 2013	Information RetrievalPart-Of-Speech Tagging	—Unverified
Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network	Oct 13, 2016	DecoderSpeech Enhancement	—Unverified
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech	May 26, 2025	AttributeEmotional Speech Synthesis	—Unverified
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models	Feb 27, 2025	DiversityLanguage Modeling	—Unverified
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling	Mar 21, 2022	DecoderSpeech Synthesis	—Unverified
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs	Jan 28, 2022	DenoisingSpeech Synthesis	—Unverified
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech	Apr 3, 2021	DenoisingGPU	—Unverified
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis	Jun 15, 2023	DenoisingSpeech Synthesis	—Unverified
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention	Oct 15, 2021	Simultaneous Speech-to-Speech TranslationSpeech Synthesis	—Unverified
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization	Oct 30, 2018	Data AugmentationDisentanglement	—Unverified
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis	Oct 14, 2024	DenoisingSpeaker Verification	—Unverified
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis	Jul 19, 2019	Speech Synthesis	—Unverified
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus	May 1, 2020	Speech Synthesis	—Unverified
DNN Filter Bank Cepstral Coefficients for Spoofing Detection	Feb 13, 2017	Speaker VerificationSpeech Synthesis	—Unverified
On Error Propagation of Diffusion Models	Aug 9, 2023	DenoisingImage Generation	—Unverified
Do Prosody Transfer Models Transfer Prosody?	Mar 7, 2023	Speech Synthesistext-to-speech	—Unverified
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis	May 14, 2025	Audio GenerationAudio Synthesis	—Unverified

Show:10 25 50

← PrevPage 47 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified