Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1249 papers

Title	Date	Tasks	Status
Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM	Dec 1, 2016	Expressive Speech SynthesisSpeech Recognition	—Unverified
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations	Nov 11, 2022	Emotional Speech SynthesisSpeech Synthesis	—Unverified
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis	Feb 3, 2025	QuantizationSpeech Synthesis	—Unverified
A Machine of Few Words -- Interactive Speaker Recognition with Reinforcement Learning	Aug 7, 2020	Decision Makingreinforcement-learning	—Unverified
Continual Speaker Adaptation for Text-to-Speech Synthesis	Mar 26, 2021	Continual LearningDiversity	—Unverified
Contextual Expressive Text-to-Speech	Nov 26, 2022	Speech Synthesistext-to-speech	—Unverified
A Survey on Bridging EEG Signals and Generative AI: From Image and Text to Beyond	Feb 17, 2025	Contrastive LearningEEG	—Unverified
Constructive Interaction for Talking about Interesting Topics	May 1, 2012	ManagementSpeech Recognition	—Unverified
Construction of English-French Multimodal Affective Conversational Corpus from TV Dramas	May 1, 2018	Emotion RecognitionSpeech Recognition	—Unverified
A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder	Oct 13, 2016	DecoderSentence	—Unverified
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input	Feb 19, 2021	Language ModelingLanguage Modelling	—Unverified
Conditioning Sequence-to-sequence Networks with Learned Activations	Sep 29, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Conditional Spoken Digit Generation with StyleGAN	Sep 15, 2020	Image GenerationSpeech Synthesis	—Unverified
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis	Aug 3, 2022	Speech Synthesistext-to-speech	—Unverified
A Streamwise GAN Vocoder for Wideband Speech Coding at Very Low Bit Rate	Aug 9, 2021	Speech Synthesis	—Unverified
Aligning phonemes using finte-state methods	May 1, 2017	Speech SynthesisSpelling Correction	—Unverified
CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis	Dec 16, 2023	Contrastive LearningSelf-Supervised Learning	—Unverified
Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need	Jul 2, 2022	AllSpeech Synthesis	—Unverified
AS-Speech: Adaptive Style For Speech Synthesis	Sep 9, 2024	RhythmSpeech Synthesis	—Unverified
Computer-Aided Quality Assurance of an Icelandic Pronunciation Dictionary	May 1, 2014	speech-recognitionSpeech Recognition	—Unverified
Complete reconstruction of the tongue contour through acoustic to articulatory inversion using real-time MRI data	Nov 4, 2024	Speech Synthesis	—Unverified
Assessing Evaluation Metrics for Speech-to-Speech Translation	Oct 26, 2021	Machine TranslationOpen-Ended Question Answering	—Unverified
Aligning Opinions: Cross-Lingual Opinion Mining with Dependencies	Jul 1, 2015	Coreference ResolutionNamed Entity Recognition (NER)	—Unverified
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History	Jun 16, 2022	Self-Supervised LearningSentence	—Unverified
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding	Oct 17, 2024	Speech Synthesis	—Unverified
Comparing performance of different set-covering strategies for linguistic content optimization in speech corpora	May 1, 2012	DescriptiveSpeech Recognition	—Unverified
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech	Jul 31, 2023	Acoustic ModellingSpeech Synthesis	—Unverified
Compact Neural TTS Voices for Accessibility	Jan 28, 2025	Speech Synthesistext-to-speech	—Unverified
ASR-based Features for Emotion Recognition: A Transfer Learning Approach	May 23, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation	Apr 29, 2025	In-Context LearningSpeech Synthesis	—Unverified
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis	May 1, 2016	Expressive Speech SynthesisSpeech Synthesis	—Unverified
Combining Incremental Language Generation and Incremental Speech Synthesis for Adaptive Information Presentation	Jul 1, 2012	Speech SynthesisSpoken Dialogue Systems	—Unverified
Combining Human Inputters and Language Services to provide Multi-language support system for International Symposiums	Dec 1, 2016	Automatic Speech Recognition (ASR)Machine Translation	—Unverified
Alert!... Calm Down, There is Nothing to Worry About. Warning and Soothing Speech Synthesis.	May 1, 2014	Expressive Speech SynthesisSentence	—Unverified
A Corpus of Neutral Voice Speech in Brazilian Portuguese	May 21, 2021	Speech Synthesistext-to-speech	—Unverified
Collective Learning Mechanism based Optimal Transport Generative Adversarial Network for Non-parallel Voice Conversion	Apr 18, 2025	Generative Adversarial NetworkImage Generation	—Unverified
Collaborative Watermarking for Adversarial Speech Synthesis	Sep 26, 2023	Speaker VerificationSpeech Synthesis	—Unverified
ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis	May 26, 2025	DeepFake DetectionFace Swapping	—Unverified
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints	Dec 2, 2023	Speech Synthesistext-to-speech	—Unverified
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems	Jun 11, 2024	Audio SynthesisFace Swapping	—Unverified
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia	Jun 1, 2022	Speech Synthesis	—Unverified
CoALT: A Software for Comparing Automatic Labelling Tools	May 1, 2012	Speech RecognitionSpeech Synthesis	—Unverified
CML-TTS A Multilingual Dataset for Speech Synthesis in Low-Resource Languages	Jun 16, 2023	Speech Synthesistext-to-speech	—Unverified
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion	Jun 22, 2020	Speech Synthesis	—Unverified
AI-Based IVR	Aug 20, 2024	Speech SynthesisSpeech-to-Text	—Unverified
A Conventional Orthography for Tunisian Arabic	May 1, 2014	Language ModellingMachine Translation	—Unverified
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding	May 21, 2025	Speech Synthesis	—Unverified
Cloning one's voice using very limited data in the wild	Oct 7, 2021	Speech Synthesis	—Unverified
CleanUNet 2: A Hybrid Speech Denoising Model on Waveform and Spectrogram	Sep 12, 2023	DenoisingSpeech Denoising	—Unverified
ClArTTS: An Open-Source Classical Arabic Text-to-Speech Corpus	Feb 28, 2023	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 8 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified