Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 576–600 of 1249 papers

Title	Date	Tasks	Status
An Initial study on Birdsong Re-synthesis Using Neural Vocoders	Sep 21, 2022	ResynthesisSpeech Synthesis	—Unverified
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis	Jan 22, 2024	Speaker VerificationSpeech Synthesis	—Unverified
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified
完全基於類神經網路之語音合成系統初步研究 (A Preliminary Study on Fully Neural Network-based Speech Synthesis System) [In Chinese]	Nov 1, 2017	Speech Synthesis	—Unverified
Disentangling Correlated Speaker and Noise for Speech Synthesis via Data Augmentation and Adversarial Factorization	Oct 30, 2018	Data AugmentationDisentanglement	—Unverified
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS	Oct 9, 2024	DiversitySpeech Synthesis	—Unverified
Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention	Oct 15, 2021	Simultaneous Speech-to-Speech TranslationSpeech Synthesis	—Unverified
BAD: An Assistant tool for making verses in Basque	Apr 1, 2012	Speech SynthesisText-To-Speech Synthesis	—Unverified
An In-depth Analysis of the Effect of Text Normalization in Social Media	May 1, 2015	Dependency Parsingnamed-entity-recognition	—Unverified
A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis	Oct 6, 2015	Speech SynthesisVocal Bursts Intensity Prediction	—Unverified
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions	Feb 18, 2025	Speech Synthesis	—Unverified
An explainability study of the constant Q cepstral coefficient spoofing countermeasure for automatic speaker verification	Apr 19, 2020	Speaker VerificationSpeech Synthesis	—Unverified
Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters	Jun 19, 2021	Speech Synthesistext-to-speech	—Unverified
Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis	Jun 15, 2023	DenoisingSpeech Synthesis	—Unverified
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech	Apr 3, 2021	DenoisingGPU	—Unverified
An Expert System for Automatic Reading of A Text Written in Standard Arabic	May 8, 2014	Speech Synthesistext-to-speech	—Unverified
DiffGAN-TTS: High-Fidelity and Efficient Text-to-Speech with Denoising Diffusion GANs	Jan 28, 2022	DenoisingSpeech Synthesis	—Unverified
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling	Mar 21, 2022	DecoderSpeech Synthesis	—Unverified
A Variational EM Method for Pole-Zero Modeling of Speech with Mixed Block Sparse and Gaussian Excitation	Jun 24, 2017	speech-recognitionSpeech Recognition	—Unverified
DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models	Feb 27, 2025	DiversityLanguage Modeling	—Unverified
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech	May 26, 2025	AttributeEmotional Speech Synthesis	—Unverified
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis	Apr 14, 2025	RAGRetrieval-augmented Generation	—Unverified
An Experimental Study: Assessing the Combined Framework of WavLM and BEST-RQ for Text-to-Speech Synthesis	Dec 8, 2023	BenchmarkingQuantization	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network	Oct 13, 2016	DecoderSpeech Enhancement	—Unverified

Show:10 25 50

← PrevPage 24 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified