Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 1249 papers

Title	Date	Tasks	Status
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis	Feb 6, 2020	DisentanglementSpeech Synthesis	—Unverified
Fully Unsupervised Training of Few-shot Keyword Spotting	Oct 6, 2022	Keyword SpottingMetric Learning	—Unverified
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models	Feb 27, 2025	DiversityLanguage Modeling	—Unverified
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech	May 26, 2025	AttributeEmotional Speech Synthesis	—Unverified
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis	Jun 29, 2021	Speech Synthesistext-to-speech	—Unverified
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks	Oct 6, 2021	Emotional Speech SynthesisSpeech Synthesis	—Unverified
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis	Apr 14, 2025	RAGRetrieval-augmented Generation	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
Gender Bias in Instruction-Guided Speech Synthesis Models	Feb 8, 2025	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Generacion de voces artificiales infantiles en castellano con acento costarricense	Feb 2, 2021	Speech Synthesis	—Unverified
Guided-TTS:Text-to-Speech with Untranscribed Speech	Sep 29, 2021	Speech Synthesistext-to-speech	—Unverified
Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network	Oct 13, 2016	DecoderSpeech Enhancement	—Unverified
Auto Spell Suggestion for High Quality Speech Synthesis in Hindi	Feb 15, 2014	Speech Synthesistext-to-speech	—Unverified
Autoregressive Speech Synthesis without Vector Quantization	Jul 11, 2024	Audio CompressionDiversity	—Unverified
Development of Marathi Part of Speech Tagger Using Statistical Approach	Oct 2, 2013	Information RetrievalPart-Of-Speech Tagging	—Unverified
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis	Sep 21, 2023	Emotion RecognitionSpeech Synthesis	—Unverified
Development of Mandarin-English code-switching speech synthesis system	Nov 1, 2022	SentenceSpeech Synthesis	—Unverified
Development and Evaluation of Speech Synthesis Corpora for Latvian	May 1, 2020	speech-recognitionSpeech Recognition	—Unverified
Autoregressive Speech Synthesis with Next-Distribution Prediction	Dec 22, 2024	Language ModelingLanguage Modelling	—Unverified
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis	Jun 8, 2024	Audio GenerationDecoder	—Unverified
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis	Mar 19, 2024	In-Context LearningSpeech Synthesis	—Unverified
A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis	Dec 29, 2019	Speech Synthesis	—Unverified
Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired	Nov 28, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Designing the Latvian Speech Recognition Corpus	May 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Designing Language Technology Applications: A Wizard of Oz Driven Prototyping Framework	Apr 1, 2014	Machine TranslationSpeech Recognition	—Unverified

Show:10 25 50

← PrevPage 21 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified