Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–550 of 1249 papers

Title	Date	Tasks	Status
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis	Feb 6, 2020	DisentanglementSpeech Synthesis	—Unverified
Fully Unsupervised Training of Few-shot Keyword Spotting	Oct 6, 2022	Keyword SpottingMetric Learning	—Unverified
DiEmo-TTS: Disentangled Emotion Representations via Self-Supervised Distillation for Cross-Speaker Emotion Transfer in Text-to-Speech	May 26, 2025	AttributeEmotional Speech Synthesis	—Unverified
AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis	Apr 14, 2025	RAGRetrieval-augmented Generation	—Unverified
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis	Jun 29, 2021	Speech Synthesistext-to-speech	—Unverified
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks	Oct 6, 2021	Emotional Speech SynthesisSpeech Synthesis	—Unverified
A distributed cloud-based dialog system for conversational application development	Sep 1, 2015	Speech RecognitionSpeech Synthesis	—Unverified
HMM-based data augmentation for E2E systems for building conversational speech synthesis systems	Dec 22, 2022	Data AugmentationLanguage Modeling	—Unverified
Gender Bias in Instruction-Guided Speech Synthesis Models	Feb 8, 2025	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Generacion de voces artificiales infantiles en castellano con acento costarricense	Feb 2, 2021	Speech Synthesis	—Unverified
Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network	Oct 13, 2016	DecoderSpeech Enhancement	—Unverified
Auto Spell Suggestion for High Quality Speech Synthesis in Hindi	Feb 15, 2014	Speech Synthesistext-to-speech	—Unverified
Autoregressive Speech Synthesis without Vector Quantization	Jul 11, 2024	Audio CompressionDiversity	—Unverified
Development of Marathi Part of Speech Tagger Using Statistical Approach	Oct 2, 2013	Information RetrievalPart-Of-Speech Tagging	—Unverified
A Discourse-level Multi-scale Prosodic Model for Fine-grained Emotion Analysis	Sep 21, 2023	Emotion RecognitionSpeech Synthesis	—Unverified
Development of Mandarin-English code-switching speech synthesis system	Nov 1, 2022	SentenceSpeech Synthesis	—Unverified
Development and Evaluation of Speech Synthesis Corpora for Latvian	May 1, 2020	speech-recognitionSpeech Recognition	—Unverified
Autoregressive Speech Synthesis with Next-Distribution Prediction	Dec 22, 2024	Language ModelingLanguage Modelling	—Unverified
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis	Jun 8, 2024	Audio GenerationDecoder	—Unverified
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis	Mar 19, 2024	In-Context LearningSpeech Synthesis	—Unverified
A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis	Dec 29, 2019	Speech Synthesis	—Unverified
Designing the Next Generation of Intelligent Personal Robotic Assistants for the Physically Impaired	Nov 28, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Designing the Latvian Speech Recognition Corpus	May 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Designing Language Technology Applications: A Wizard of Oz Driven Prototyping Framework	Apr 1, 2014	Machine TranslationSpeech Recognition	—Unverified
Designing French Tale Corpora for Entertaining Text To Speech Synthesis	May 1, 2012	SentenceSpeech Synthesis	—Unverified
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified
Design and Development of Speech Corpora for Air Traffic Control Training	May 1, 2018	Automatic Speech Recognition (ASR)Speech Recognition	—Unverified
Design and development a children's speech database	May 25, 2016	speech-recognitionSpeech Recognition	—Unverified
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis	Apr 14, 2021	Dependency ParsingRepresentation Learning	—Unverified
De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synth\`ese par HMM (Toward the use of information density based descriptive features in HMM based speech synthesis)	Jul 1, 2016	DescriptiveSENTER	—Unverified
Automatic Prosody Prediction for Chinese Speech Synthesis using BLSTM-RNN and Embedding Features	Nov 2, 2015	Feature EngineeringProsody Prediction	—Unverified
An analysis on the effects of speaker embedding choice in non auto-regressive TTS	Jul 19, 2023	Representation LearningSpeech Synthesis	—Unverified
Listening while Speaking and Visualizing: Improving ASR through Multimodal Chain	Jun 3, 2019	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
HMM-based Mandarin Singing Voice Synthesis Using Tailored Synthesis Units and Question Sets	Dec 1, 2013	Singing Voice SynthesisSpeech Synthesis	—Unverified
Improved pronunciation prediction accuracy using morphology	Aug 1, 2021	LEMMAMorphological Inflection	—Unverified
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis	May 29, 2023	Speech Synthesistext-to-speech	—Unverified
Deliberation Networks and How to Train Them	Nov 6, 2022	Machine TranslationSpeech Synthesis	—Unverified
High-quality nonparallel voice conversion based on cycle-consistent adversarial network	Apr 2, 2018	Generative Adversarial NetworkImage-to-Image Translation	—Unverified
An Analysis of the Effect of Emotional Speech Synthesis on Non-Task-Oriented Dialogue System	Jul 1, 2018	Dialogue ManagementEmotional Speech Synthesis	—Unverified
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey	Sep 26, 2020	Dialect IdentificationMachine Translation	—Unverified
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units	Jun 29, 2023	Speech Synthesistext-to-speech	—Unverified
High-quality Speech Synthesis Using Super-resolution Mel-Spectrogram	Dec 3, 2019	Image-to-Image TranslationSpeech Enhancement	—Unverified
Automatically Acquiring Fine-Grained Information Status Distinctions in German	Jul 1, 2012	Coreference ResolutionSpeech Synthesis	—Unverified
Deep Text-to-Speech System with Seq2Seq Model	Mar 11, 2019	modelSpeech Synthesis	—Unverified
A Deep Learning Approach to Data-driven Parameterizations for Statistical Parametric Speech Synthesis	Sep 30, 2014	DenoisingSpeech Synthesis	—Unverified
Automated detection of pronunciation errors in non-native English speech employing deep learning	Sep 13, 2022	Speech Synthesis	—Unverified
Deep Speech Synthesis from Multimodal Articulatory Representations	Dec 17, 2024	Speech SynthesisTransfer Learning	—Unverified
Analyzing Speech Unit Selection for Textless Speech-to-Speech Translation	Jul 8, 2024	Automatic Speech RecognitionEmotion Recognition	—Unverified
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model	Jun 25, 2024	Computational EfficiencyLanguage Modeling	—Unverified
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency	Nov 17, 2021	CPUDecoder	—Unverified

Show:10 25 50

← PrevPage 11 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified