Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1101–1150 of 1249 papers

Title	Date	Tasks	Status
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations	Nov 11, 2022	Emotional Speech SynthesisSpeech Synthesis	—Unverified
Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM	Dec 1, 2016	Expressive Speech SynthesisSpeech Recognition	—Unverified
Continuous Speech Synthesis using per-token Latent Diffusion	Oct 21, 2024	Image GenerationQuantization	—Unverified
Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis	Jun 12, 2021	Speech Synthesis	—Unverified
Controllable Accented Text-to-Speech Synthesis	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified
Controllable Context-aware Conversational Speech Synthesis	Jun 21, 2021	Speech Synthesis	—Unverified
Controllable Data Generation by Deep Learning: A Review	Jul 19, 2022	Deep LearningSpeech Synthesis	—Unverified
Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions	Oct 26, 2023	Speech Synthesis	—Unverified
Controllable Neural Prosody Synthesis	Aug 7, 2020	Speech Synthesis	—Unverified
Controllable neural text-to-speech synthesis using intuitive prosodic features	Sep 14, 2020	SentenceSpeech Synthesis	—Unverified
Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU	Feb 25, 2020	CPUProsody Prediction	—Unverified
Controllable speech synthesis by learning discrete phoneme-level prosodic representations	Nov 29, 2022	ClusteringSpeech Synthesis	—Unverified
Controllable Prosody Generation With Partial Inputs	Mar 14, 2023	Speech Synthesistext-to-speech	—Unverified
Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU	May 1, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models	Sep 18, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Visual-speech Synthesis of Exaggerated Corrective Feedback	Sep 12, 2020	Speech Synthesis	—Unverified
Counterfactual Activation Editing for Post-hoc Prosody and Mispronunciation Correction in TTS Models	Jun 1, 2025	counterfactualSpeech Synthesis	—Unverified
CPJD Corpus: Crowdsourced Parallel Speech Corpus of Japanese Dialects	May 1, 2018	Machine TranslationSpeech Recognition	—Unverified
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform	Dec 13, 2017	Speech Synthesistext-to-speech	—Unverified
Creating New Voices using Normalizing Flows	Dec 22, 2023	Speech Synthesistext-to-speech	—Unverified
Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models	May 27, 2023	Speech SynthesisVoice Conversion	—Unverified
Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech	Sep 15, 2023	Knowledge DistillationSpeech Synthesis	—Unverified
Cross-lingual Low Resource Speaker Adaptation Using Phonological Features	Nov 17, 2021	Speech Synthesis	—Unverified
Cross-lingual Multi-speaker Text-to-speech Synthesis for Voice Cloning without Using Parallel Corpus for Unseen Speakers	Nov 26, 2019	Speech Synthesistext-to-speech	—Unverified
Cross-lingual Multispeaker Text-to-Speech under Limited-Data Scenario	May 21, 2020	AttributeSpeech Synthesis	—Unverified
Cross-lingual Prosody Transfer for Expressive Machine Dubbing	Jun 20, 2023	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training	Jan 20, 2022	Multi-Task LearningSpeech Synthesis	—Unverified
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis	Jul 27, 2021	Expressive Speech SynthesisSpeech Synthesis	—Unverified
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation	Dec 28, 2024	Speech Synthesis	—Unverified
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis	Feb 28, 2023	Speech Synthesistext-to-speech	—Unverified
Cross-Utterance Conditioned VAE for Speech Generation	Sep 8, 2023	Speech Synthesistext-to-speech	—Unverified
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis	Jun 15, 2021	Speech Synthesistext-to-speech	—Unverified
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech	Oct 17, 2024	DisentanglementQuantization	—Unverified
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores	Apr 7, 2022	Self-Supervised LearningSpeech Synthesis	—Unverified
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified
Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis	Sep 14, 2022	DecoderMeta-Learning	—Unverified
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding	Oct 13, 2021	Speech SynthesisVoice Conversion	—Unverified
Deep Denoising Auto-encoder for Statistical Speech Synthesis	Jun 17, 2015	DenoisingSpeech Synthesis	—Unverified
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis	Jul 30, 2018	Acoustic ModellingDecoder	—Unverified
Deep Feed-forward Sequential Memory Networks for Speech Synthesis	Feb 26, 2018	speech-recognitionSpeech Recognition	—Unverified
Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts	Apr 8, 2019	Binary ClassificationDeep Learning	—Unverified
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling	Aug 9, 2020	Deep LearningSpeech Synthesis	—Unverified
Deep Performer: Score-to-Audio Music Performance Synthesis	Feb 12, 2022	DecoderSpeech Synthesis	—Unverified
Deep Speech Synthesis from Articulatory Features	Jan 16, 2022	Speech Synthesis	—Unverified
Deep Speech Synthesis from Multimodal Articulatory Representations	Dec 17, 2024	Speech SynthesisTransfer Learning	—Unverified
Deep Text-to-Speech System with Seq2Seq Model	Mar 11, 2019	modelSpeech Synthesis	—Unverified
Deliberation Networks and How to Train Them	Nov 6, 2022	Machine TranslationSpeech Synthesis	—Unverified
De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synth\`ese par HMM (Toward the use of information density based descriptive features in HMM based speech synthesis)	Jul 1, 2016	DescriptiveSENTER	—Unverified
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis	Apr 14, 2021	Dependency ParsingRepresentation Learning	—Unverified
Design and development a children's speech database	May 25, 2016	speech-recognitionSpeech Recognition	—Unverified

Show:10 25 50

← PrevPage 23 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified