Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–575 of 1249 papers

Title	Date	Tasks	Status	Hype
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection	Jun 15, 2022	feature selectionSpeech Synthesis	—Unverified	0
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks	Jun 14, 2022	Action SegmentationInstance Segmentation	CodeCode Available	1
BigVGAN: A Universal Neural Vocoder with Large-Scale Training	Jun 9, 2022	Audio GenerationAudio Synthesis	CodeCode Available	3
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE	Jun 6, 2022	Representation LearningSpeech Representation Learning	—Unverified	0
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations	Jun 2, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
SyntAct: A Synthesized Database of Basic Emotions	Jun 1, 2022	Emotion RecognitionSpeech Emotion Recognition	—Unverified	0
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia	Jun 1, 2022	Speech Synthesis	—Unverified	0
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available	0
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis	May 30, 2022	Data AugmentationSelf-Supervised Learning	CodeCode Available	2
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation	May 25, 2022	Representation LearningRhythm	CodeCode Available	1
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions	May 19, 2022	Speech SynthesisStyle Transfer	CodeCode Available	1
SDS-200: A Swiss German Speech to Standard German Text Corpus	May 19, 2022	Speech SynthesisTranslation	CodeCode Available	0
Macedonian Speech Synthesis for Assistive Technology Applications	May 18, 2022	Deep LearningPitch control	—Unverified	0
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech	May 15, 2022	Speech SynthesisStyle Transfer	CodeCode Available	2
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model	May 11, 2022	Packet Loss ConcealmentSpeech Enhancement	CodeCode Available	3
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts	May 10, 2022	Speech SynthesisVoice Conversion	CodeCode Available	0
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis	May 9, 2022	Deep LearningSemantic Communication	CodeCode Available	1
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	May 9, 2022	SentenceSpeech Synthesis	CodeCode Available	2
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence	May 9, 2022	Speech Synthesistext-to-speech	—Unverified	0
SVTS: Scalable Video-to-Speech Synthesis	May 4, 2022	Speech Synthesis	CodeCode Available	1

Show:10 25 50

← PrevPage 23 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified