Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 651–675 of 1249 papers

Title	Date	Tasks	Status
PoeticTTS -- Controllable Poetry Reading for Literary Studies	Jul 11, 2022	Speech Synthesis	—Unverified
Speaker Anonymization with Phonetic Intermediate Representations	Jul 11, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis	Jul 8, 2022	Lip to Speech SynthesisSpeech Synthesis	CodeCode Available
End-to-End Binaural Speech Synthesis	Jul 8, 2022	DecoderSpeech Synthesis	—Unverified
Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS)	Jul 4, 2022	Speech Synthesistext-to-speech	—Unverified
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model	Jul 4, 2022	Language ModelingLanguage Modelling	—Unverified
Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need	Jul 2, 2022	AllSpeech Synthesis	—Unverified
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder	Jun 30, 2022	Speech Synthesistext-to-speech	—Unverified
R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS	Jun 30, 2022	DecoderGPU	—Unverified
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre	Jun 29, 2022	DisentanglementSpeaker Identification	—Unverified
Expressive, Variable, and Controllable Duration Modelling in TTS	Jun 28, 2022	Normalising FlowsSpeech Synthesis	—Unverified
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis	Jun 25, 2022	Contrastive LearningDeep Clustering	—Unverified
WOLONet: Wave Outlooker for Efficient and High Fidelity Speech Synthesis	Jun 20, 2022	CPUSpeech Synthesis	—Unverified
Acoustic Modeling for End-to-End Empathetic Dialogue Speech Synthesis Using Linguistic and Prosodic Contexts of Dialogue History	Jun 16, 2022	Self-Supervised LearningSentence	—Unverified
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection	Jun 15, 2022	feature selectionSpeech Synthesis	—Unverified
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE	Jun 6, 2022	Representation LearningSpeech Representation Learning	—Unverified
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations	Jun 2, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
SyntAct: A Synthesized Database of Basic Emotions	Jun 1, 2022	Emotion RecognitionSpeech Emotion Recognition	—Unverified
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia	Jun 1, 2022	Speech Synthesis	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available
SDS-200: A Swiss German Speech to Standard German Text Corpus	May 19, 2022	Speech SynthesisTranslation	CodeCode Available

Show:10 25 50

← PrevPage 27 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified