Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 276–300 of 1249 papers

Title	Date	Tasks	Status
Combining Human Inputters and Language Services to provide Multi-language support system for International Symposiums	Dec 1, 2016	Automatic Speech Recognition (ASR)Machine Translation	—Unverified
Alert!... Calm Down, There is Nothing to Worry About. Warning and Soothing Speech Synthesis.	May 1, 2014	Expressive Speech SynthesisSentence	—Unverified
A Conventional Orthography for Tunisian Arabic	May 1, 2014	Language ModellingMachine Translation	—Unverified
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion	Apr 18, 2025	Generative Adversarial NetworkImage Generation	—Unverified
Collaborative Watermarking for Adversarial Speech Synthesis	Sep 26, 2023	Speaker VerificationSpeech Synthesis	—Unverified
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis	May 26, 2025	DeepFake DetectionFace Swapping	—Unverified
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints	Dec 2, 2023	Speech Synthesistext-to-speech	—Unverified
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems	Jun 11, 2024	Audio SynthesisFace Swapping	—Unverified
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia	Jun 1, 2022	Speech Synthesis	—Unverified
CoALT: A Software for Comparing Automatic Labelling Tools	May 1, 2012	Speech RecognitionSpeech Synthesis	—Unverified
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages	Jun 16, 2023	Speech Synthesistext-to-speech	—Unverified
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion	Jun 22, 2020	Speech Synthesis	—Unverified
AI-Based IVR	Aug 20, 2024	Speech SynthesisSpeech-to-Text	—Unverified
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology	Mar 3, 2025	Speech SynthesisVoice Cloning	—Unverified
Deep Text-to-Speech System with Seq2Seq Model	Mar 11, 2019	modelSpeech Synthesis	—Unverified
Cloning one's voice using very limited data in the wild	Oct 7, 2021	Speech Synthesis	—Unverified
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram	Sep 12, 2023	DenoisingSpeech Denoising	—Unverified
A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting	Jul 6, 2025	Hybrid Machine Learningspeech-recognition	—Unverified
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus	Feb 28, 2023	Speech Synthesistext-to-speech	—Unverified
A Comprehensive Survey on Diffusion Models and Their Applications	Jul 1, 2024	Speech SynthesisSurvey	—Unverified
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network	May 17, 2019	DecoderSentence	—Unverified
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis	May 1, 2016	Speech Synthesis	—Unverified
A Robotic Agent in a Virtual Environment that Performs Situated Incremental Understanding of Navigational Utterances	Aug 1, 2013	Language ModellingSpeech Recognition	—Unverified
A Review of Deep Learning Techniques for Speech Processing	Apr 30, 2023	Automatic Speech RecognitionDeep Learning	—Unverified
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings	May 23, 2023	ChatbotReading Comprehension	—Unverified

Show:10 25 50

← PrevPage 12 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified