Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 461–470 of 1249 papers

Title	Date	Tasks	Status	Hype
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language	Dec 16, 2022	Language ModelingLanguage Modelling	—Unverified	0
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis	Dec 15, 2022	RelationSpeech Synthesis	CodeCode Available	1
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis	Dec 13, 2022	Data AugmentationSpeech Synthesis	—Unverified	0
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset	Dec 11, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech	Nov 30, 2022	Speech Synthesistext-to-speech	—Unverified	0
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing	Nov 30, 2022	Machine TranslationSentence	—Unverified	0
Controllable speech synthesis by learning discrete phoneme-level prosodic representations	Nov 29, 2022	ClusteringSpeech Synthesis	—Unverified	0
Contextual Expressive Text-to-Speech	Nov 26, 2022	Speech Synthesistext-to-speech	—Unverified	0
Efficient Incremental Text-to-Speech on GPUs	Nov 25, 2022	GPUSpeech Synthesis	—Unverified	0
PromptTTS: Controllable Text-to-Speech with Text Descriptions	Nov 22, 2022	DecoderSpeech Synthesis	CodeCode Available	0

Show:10 25 50

← PrevPage 47 of 125Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified