Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 601–650 of 1249 papers

Title	Date	Tasks	Status
POS-Tag Based Poetry Generation with WordNet	Aug 1, 2013	POSSpeech Synthesis	—Unverified
Practical Evaluation of Human and Synthesized Speech for Virtual Human Dialogue Systems	May 1, 2012	Speech Synthesis	—Unverified
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis	Aug 4, 2018	Speech Synthesistext-to-speech	—Unverified
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis	Nov 2, 2022	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Predicting Phrase Breaks in Classical and Modern Standard Arabic Text	May 1, 2012	ChunkingHuman Parsing	—Unverified
Predicting Romanian Stress Assignment	Apr 1, 2014	Speech SynthesisText-To-Speech Synthesis	—Unverified
Preliminary study on using vector quantization latent spaces for TTS/VC systems with consistent performance	Jun 25, 2021	QuantizationSpeaker anonymization	—Unverified
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis	Nov 10, 2020	Speech Synthesis	—Unverified
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior	Jun 11, 2021	Audio GenerationDenoising	—Unverified
Privacy-oriented manipulation of speaker representations	Oct 10, 2023	Speaker RecognitionSpeech Synthesis	—Unverified
Probabilistic Dialogue Models with Prior Domain Knowledge	Jul 1, 2012	Dialogue ManagementSemantic Parsing	—Unverified
Probing Speaker-specific Features in Speaker Representations	Jan 9, 2025	Self-Supervised LearningSpeaker Verification	—Unverified
Probing the Feasibility of Multilingual Speaker Anonymization	Jul 3, 2024	Speaker anonymizationSpeech Synthesis	—Unverified
PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control	Jan 10, 2025	Speech Synthesistext-to-speech	—Unverified
PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders	Apr 3, 2024	Representation LearningSpeaker Verification	—Unverified
The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains	Oct 4, 2023	Speech Synthesistext-to-speech	—Unverified
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation	Aug 1, 2024	Representation LearningSpeech Synthesis	—Unverified
Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach	Sep 10, 2024	Speech Synthesistext-to-speech	—Unverified
Voice Cloning for Dysarthric Speech Synthesis: Addressing Data Scarcity in Speech-Language Pathology	Mar 3, 2025	Speech SynthesisVoice Cloning	—Unverified
UDDETTS: Unifying Discrete and Dimensional Emotions for Controllable Emotional Text-to-Speech	May 15, 2025	Emotional Speech SynthesisLanguage Modeling	—Unverified
完全基於類神經網路之語音合成系統初步研究 (A Preliminary Study on Fully Neural Network-based Speech Synthesis System) [In Chinese]	Nov 1, 2017	Speech Synthesis	—Unverified
A Bengali HMM Based Speech Synthesis System	Jun 16, 2014	Speech Synthesistext-to-speech	—Unverified
A Bengali Speech Synthesizer on Android OS	Jul 1, 2012	Speech Synthesis	—Unverified
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding	May 21, 2025	Speech Synthesis	—Unverified
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding	Oct 17, 2024	Speech Synthesis	—Unverified
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training	Jun 3, 2024	Speech Synthesistext-to-speech	—Unverified
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS	Sep 30, 2024	Data AugmentationSpeech Synthesis	—Unverified
Accented Text-to-Speech Synthesis with Limited Data	May 8, 2023	Speech Synthesistext-to-speech	—Unverified
Accurate synthesis of Dysarthric Speech for ASR data augmentation	Aug 16, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
A Challenge Set and Methods for Noun-Verb Ambiguity	Oct 1, 2018	Speech Synthesistext-to-speech	—Unverified
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality	Apr 5, 2022	BenchmarkingSelf-Supervised Learning	—Unverified
A Comparison of Manual and Automatic Voice Repair for Individual with Vocal Disabilities	Sep 1, 2015	Speech Synthesis	—Unverified
A comparison of recent waveform generation and acoustic modeling methods for neural-network-based speech synthesis	Apr 7, 2018	Speech Synthesis	—Unverified
A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems	May 26, 2020	GPUSpeech Synthesis	—Unverified
A Comprehensive Survey on Diffusion Models and Their Applications	Jul 1, 2024	Speech SynthesisSurvey	—Unverified
A Conventional Orthography for Tunisian Arabic	May 1, 2014	Language ModellingMachine Translation	—Unverified
A Corpus of Neutral Voice Speech in Brazilian Portuguese	May 21, 2021	Speech Synthesistext-to-speech	—Unverified
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History	Jun 16, 2022	Self-Supervised LearningSentence	—Unverified
Adaptation de la prononciation pour la synth\`ese de la parole spontan\'ee en utilisant des informations linguistiques (Pronunciation adaptation for spontaneous speech synthesis using linguistic information)	Jul 1, 2016	Speech Synthesis	—Unverified
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers	Nov 1, 2022	parameter-efficient fine-tuningSpeech Synthesis	—Unverified
Adaptive Parser-Centric Text Normalization	Aug 1, 2013	Machine TranslationSpeech Recognition	—Unverified
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios	Apr 1, 2022	Speech Synthesistext-to-speech	—Unverified
A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification	Oct 19, 2022	Speech SynthesisText Generation	—Unverified
AdaVocoder: Adaptive Vocoder for Custom Voice	Mar 18, 2022	Speech SynthesisTransfer Learning	—Unverified
A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis	Sep 30, 2014	DenoisingSpeech Synthesis	—Unverified
A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis	Dec 29, 2019	Speech Synthesis	—Unverified
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis	Sep 21, 2023	Emotion RecognitionSpeech Synthesis	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
Advances in Speech Vocoding for Text-to-Speech with Continuous Parameters	Jun 19, 2021	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 13 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified