Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1249 of 1249 papers

Title	Date	Tasks	Status
ELF: Encoding Speaker-Specific Latent Speech Feature for Speech Synthesis	Nov 20, 2023	Speech Synthesis	—Unverified
End-to-End Binaural Speech Synthesis	Jul 8, 2022	DecoderSpeech Synthesis	—Unverified
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training	Jun 26, 2019	Emotional Speech SynthesisEmotion Recognition	—Unverified
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE	Oct 19, 2020	Speech Synthesistext-to-speech	—Unverified
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks	Apr 27, 2021	Lip ReadingSpeech Synthesis	—Unverified
Energy-Based Models For Speech Synthesis	Oct 19, 2023	Speech Synthesis	—Unverified
Enhancing audio quality for expressive Neural Text-to-Speech	Aug 13, 2021	Acoustic ModellingSpeech Synthesis	—Unverified
Enhancing Multilingual Speech Generation and Recognition Abilities in LLMs with Constructed Code-switched Data	Sep 17, 2024	Speech Synthesis	—Unverified
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback	Jun 2, 2024	Speech Synthesistext-to-speech	—Unverified
Ensemble prosody prediction for expressive speech synthesis	Apr 3, 2023	DiversityEnsemble Learning	—Unverified
Environment Aware Text-to-Speech Synthesis	Oct 8, 2021	AttributeDisentanglement	—Unverified
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified
\'Etude comparative des param\`etres d'entr\'ee pour la synth\`ese expressive audiovisuelle de la parole par DNNs (Comparative study of input parameters for DNN-based expressive audiovisual speech synthesis )	Jun 1, 2020	Speech Synthesis	—Unverified
Evaluating expressive speech synthesis from audiobook corpora for conversational phrases	May 1, 2012	ClusteringExpressive Speech Synthesis	—Unverified
Evaluating Features and Metrics for High-Quality Simulation of Early Vocal Learning of Vowels	May 20, 2020	Speech Synthesis	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Evaluating Speech-in-Speech Perception via a Humanoid Robot	Dec 19, 2023	Speech Synthesis	—Unverified
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model	May 16, 2024	HallucinationLanguage Modeling	—Unverified
\'Evaluation objective de plongements pour la synth\`ese de parole guid\'ee par r\'eseaux de neurones (Objective evaluation of embeddings for speech synthesis guided by neural networks)	Jul 1, 2019	Speech Synthesis	—Unverified
Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers	Sep 1, 2012	Speech Synthesis	—Unverified
\'Evaluation segmentale du syst\`eme de synth\`ese HTS pour le fran (Segmental evaluation of HTS) [in French]	Jun 1, 2012	Speech Synthesis	—Unverified
Everyday Speech in the Indian Subcontinent	Oct 14, 2024	Speech Synthesis	—Unverified
Excitation-based Voice Quality Analysis and Modification	Jan 2, 2020	Speech Synthesis	—Unverified
ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems	Nov 9, 2018	Speech Synthesis	—Unverified
Exploring Speech Enhancement for Low-resource Speech Synthesis	Sep 19, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Exploring the encoding of linguistic representations in the Fully-Connected Layer of generative CNNs for Speech	Jan 13, 2025	Speech Synthesis	—Unverified
Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors	Jul 18, 2021	Speech Synthesis	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Expressive Neural Voice Cloning	Jan 30, 2021	Speech SynthesisStyle Transfer	—Unverified
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder	Apr 6, 2018	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Expressive, Variable, and Controllable Duration Modelling in TTS	Jun 28, 2022	Normalising FlowsSpeech Synthesis	—Unverified
Expressivity and Speech Synthesis	Apr 30, 2024	Expressive Speech SynthesisSpeech Synthesis	—Unverified
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis	Aug 10, 2023	ResynthesisSpeech Synthesis	—Unverified
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data	Feb 29, 2024	Representation LearningSpeech Synthesis	—Unverified
F0 Modeling In Hmm-Based Speech Synthesis System Using Deep Belief Network	Feb 18, 2015	ClusteringSpeaker Verification	—Unverified
FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles	Jan 2, 2025	Speech Synthesistext-to-speech	—Unverified
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping	Sep 25, 2023	Speech Synthesistext-to-speech	—Unverified
Facial Expression-Enhanced TTS: Combining Face Representation and Emotion Intensity for Adaptive Speech	Sep 24, 2024	Emotional Speech SynthesisSpeech Synthesis	—Unverified
FADEL: Uncertainty-aware Fake Audio Detection with Evidential Deep Learning	Apr 22, 2025	Deep LearningSpeaker Verification	—Unverified
FA-GAN: Artifacts-free and Phase-aware High-fidelity GAN-based Vocoder	Jul 5, 2024	Generative Adversarial NetworkSpeech Synthesis	—Unverified
fairseq Sˆ2: A Scalable and Integrable Speech Synthesis Toolkit	Nov 1, 2021	Speech Synthesistext-to-speech	—Unverified
Fast and Accurate Decision Trees for Natural Language Processing Tasks	Sep 1, 2017	AttributeBIG-bench Machine Learning	—Unverified
Fast and High-Quality Auto-Regressive Speech Synthesis via Speculative Decoding	Oct 29, 2024	Speech Synthesistext-to-speech	—Unverified
Fast and small footprint Hybrid HMM-HiFiGAN based system for speech synthesis in Indian languages	Feb 13, 2023	Speech Synthesistext-to-speech	—Unverified
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language	Oct 1, 2013	Speech RecognitionSpeech Synthesis	—Unverified
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices	Jun 20, 2016	QuantizationSpeech Synthesis	—Unverified
Fast, High-Quality and Parameter-Efficient Articulatory Synthesis using Differentiable DSP	Sep 4, 2024	Audio SynthesisComputational Efficiency	—Unverified
Fast Labeling and Transcription with the Speechalyzer Toolkit	May 1, 2012	Audio ClassificationBenchmarking	—Unverified

Show:10 25 50

← PrevPage 25 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified