Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 826–850 of 1249 papers

Title	Date	Tasks	Status
Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU	Feb 25, 2020	CPUProsody Prediction	—Unverified
Controllable speech synthesis by learning discrete phoneme-level prosodic representations	Nov 29, 2022	ClusteringSpeech Synthesis	—Unverified
Controllable Prosody Generation With Partial Inputs	Mar 14, 2023	Speech Synthesistext-to-speech	—Unverified
Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU	May 1, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models	Sep 18, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Visual-speech Synthesis of Exaggerated Corrective Feedback	Sep 12, 2020	Speech Synthesis	—Unverified
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models	Jun 1, 2025	counterfactualSpeech Synthesis	—Unverified
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects	May 1, 2018	Machine TranslationSpeech Recognition	—Unverified
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform	Dec 13, 2017	Speech Synthesistext-to-speech	—Unverified
Creating New Voices using Normalizing Flows	Dec 22, 2023	Speech Synthesistext-to-speech	—Unverified
Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models	May 27, 2023	Speech SynthesisVoice Conversion	—Unverified
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech	Sep 15, 2023	Knowledge DistillationSpeech Synthesis	—Unverified
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features	Nov 17, 2021	Speech Synthesis	—Unverified
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers	Nov 26, 2019	Speech Synthesistext-to-speech	—Unverified
Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario	May 21, 2020	AttributeSpeech Synthesis	—Unverified
Cross-lingual Prosody Transfer for Expressive Machine Dubbing	Jun 20, 2023	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training	Jan 20, 2022	Multi-Task LearningSpeech Synthesis	—Unverified
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis	Jul 27, 2021	Expressive Speech SynthesisSpeech Synthesis	—Unverified
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation	Dec 28, 2024	Speech Synthesis	—Unverified
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis	Feb 28, 2023	Speech Synthesistext-to-speech	—Unverified
Cross-Utterance Conditioned VAE for Speech Generation	Sep 8, 2023	Speech Synthesistext-to-speech	—Unverified
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis	Jun 15, 2021	Speech Synthesistext-to-speech	—Unverified
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech	Oct 17, 2024	DisentanglementQuantization	—Unverified
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores	Apr 7, 2022	Self-Supervised LearningSpeech Synthesis	—Unverified
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 34 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified