Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 376–400 of 1249 papers

Title	Date	Tasks	Status
An Objective Evaluation Framework for Pathological Speech Synthesis	Jul 1, 2021	Speech SynthesisVoice Conversion	—Unverified
Advancing Speech Synthesis using EEG	Apr 9, 2020	EEGElectroencephalogram (EEG)	—Unverified
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified
Effect of data reduction on sequence-to-sequence neural TTS	Nov 15, 2018	Speech Synthesis	—Unverified
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Dec 13, 2024	Conditional Image GenerationImage Generation	—Unverified
DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus	May 1, 2020	Speech Synthesis	—Unverified
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified
DNN-based Speaker Embedding Using Subjective Inter-speaker Similarity for Multi-speaker Modeling in Speech Synthesis	Jul 19, 2019	Speech Synthesis	—Unverified
Bayesian Subspace HMM for the Zerospeech 2020 Challenge	May 19, 2020	Speech Synthesis	—Unverified
結合ANN、全域變異數與真實軌跡挑選之基週軌跡產生方法(A Pitch-contour Generation Method Combining ANN Prediction,Global Variance Matching, and Real-contour Selection)[In Chinese]	Oct 1, 2015	Speech Synthesis	—Unverified
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis	Oct 14, 2024	DenoisingSpeaker Verification	—Unverified
ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks	May 27, 2019	Domain AdaptationGenerative Adversarial Network	—Unverified
Eigenresiduals for improved Parametric Speech Synthesis	Jan 2, 2020	Speech Synthesis	—Unverified
An Initial study on Birdsong Re-synthesis Using Neural Vocoders	Sep 21, 2022	ResynthesisSpeech Synthesis	—Unverified
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach	Sep 10, 2024	Speech Synthesistext-to-speech	—Unverified
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis	Jan 22, 2024	Speaker VerificationSpeech Synthesis	—Unverified
Balancing Speech Understanding and Generation Using Continual Pre-training for Codec-based Speech LLM	Feb 24, 2025	Automatic Speech RecognitionLanguage Modeling	—Unverified
EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis	Sep 27, 2024	Speech Synthesis	—Unverified
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition	Oct 26, 2020	Emotion RecognitionSpeech Emotion Recognition	—Unverified
Energy-Based Models For Speech Synthesis	Oct 19, 2023	Speech Synthesis	—Unverified
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model	Jun 17, 2021	Emotional Speech SynthesisEmotion Classification	—Unverified
EMPHASIS: An Emotional Phoneme-based Acoustic Model for Speech Synthesis System	Jun 26, 2018	Emotional Speech SynthesisParameter Prediction	—Unverified
Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis	Dec 1, 2014	Speech Synthesistext-to-speech	—Unverified
Enhancing audio quality for expressive Neural Text-to-Speech	Aug 13, 2021	Acoustic ModellingSpeech Synthesis	—Unverified

Show:10 25 50

← PrevPage 16 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified