Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–575 of 1249 papers

Title	Date	Tasks	Status
Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis	May 29, 2023	Speech Synthesistext-to-speech	—Unverified
Creating Personalized Synthetic Voices from Post-Glossectomy Speech with Guided Diffusion Models	May 27, 2023	Speech SynthesisVoice Conversion	—Unverified
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM	May 24, 2023	Language ModellingQuestion Answering	CodeCode Available
ChatGPT-EDSS: Empathetic Dialogue Speech Synthesis Trained from ChatGPT-derived Context Word Embeddings	May 23, 2023	ChatbotReading Comprehension	—Unverified
CALLS: Japanese Empathetic Dialogue Speech Corpus of Complaint Handling and Attentive Listening in Customer Center	May 23, 2023	Speech Synthesis	—Unverified
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models	May 23, 2023	Speech Synthesistext-to-speech	—Unverified
Text Generation with Speech Synthesis for ASR Data Augmentation	May 22, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages	May 21, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
MParrotTTS: Multilingual Multi-speaker Text to Speech Synthesis in Low Resource Setting	May 19, 2023	Speech Synthesistext-to-speech	—Unverified
Improving Generalization Ability of Countermeasures for New Mismatch Scenario by Combining Multiple Advanced Regularization Terms	May 18, 2023	Speech Synthesis	CodeCode Available
A unified front-end framework for English text-to-speech synthesis	May 18, 2023	Speech SynthesisText Normalization	—Unverified
Empirical Analysis of Oral and Nasal Vowels of Konkani	May 17, 2023	Speech Synthesis	—Unverified
Zero-shot personalized lip-to-speech synthesis with face image based voice control	May 9, 2023	Lip to Speech SynthesisRepresentation Learning	—Unverified
Accented Text-to-Speech Synthesis with Limited Data	May 8, 2023	Speech Synthesistext-to-speech	—Unverified
M2-CTTS: End-to-End Multi-scale Multi-modal Conversational Text-to-Speech Synthesis	May 3, 2023	Speech Synthesistext-to-speech	—Unverified
A Review of Deep Learning Techniques for Speech Processing	Apr 30, 2023	Automatic Speech RecognitionDeep Learning	—Unverified
Zero-shot text-to-speech synthesis conditioned using self-supervised speech representation model	Apr 24, 2023	RhythmSelf-Supervised Learning	—Unverified
Ensemble prosody prediction for expressive speech synthesis	Apr 3, 2023	DiversityEnsemble Learning	—Unverified
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis	Mar 27, 2023	AllAutomatic Speech Recognition	—Unverified
Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis	Mar 24, 2023	Generative Adversarial NetworkSpeech Synthesis	—Unverified
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI	Mar 23, 2023	Speech EnhancementSpeech Synthesis	—Unverified
Transformers in Speech Processing: A Survey	Mar 21, 2023	Automatic Speech RecognitionSpeech Enhancement	—Unverified
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis	Mar 14, 2023	Prosody PredictionSpeech Synthesis	—Unverified
VANI: Very-lightweight Accent-controllable TTS for Native and Non-native speakers with Identity Preservation	Mar 14, 2023	DisentanglementSpeech Synthesis	—Unverified
QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis	Mar 14, 2023	Emotional Speech SynthesisSentence	—Unverified

Show:10 25 50

← PrevPage 23 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified