Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 1249 papers

Title	Date	Tasks	Status	Hype
VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection	Jun 15, 2022	feature selectionSpeech Synthesis	—Unverified	0
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks	Jun 14, 2022	Action SegmentationInstance Segmentation	CodeCode Available	1
BigVGAN: A Universal Neural Vocoder with Large-Scale Training	Jun 9, 2022	Audio GenerationAudio Synthesis	CodeCode Available	3
Unsupervised TTS Acoustic Modeling for TTS with Conditional Disentangled Sequential VAE	Jun 6, 2022	Representation LearningSpeech Representation Learning	—Unverified	0
Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations	Jun 2, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
BU-TTS: An Open-Source, Bilingual Welsh-English, Text-to-Speech Corpus	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
SyntAct: A Synthesized Database of Basic Emotions	Jun 1, 2022	Emotion RecognitionSpeech Emotion Recognition	—Unverified	0
Investigating Inter- and Intra-speaker Voice Conversion using Audiobooks	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
Building Open-source Speech Technology for Low-resource Minority Languages with SáMi as an Example – Tools, Methods and Experiments	Jun 1, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
AiRO - an Interactive Learning Tool for Children at Risk of Dyslexia	Jun 1, 2022	Speech Synthesis	—Unverified	0
Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish	May 31, 2022	Machine TranslationSpeech Synthesis	CodeCode Available	0
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis	May 30, 2022	Data AugmentationSelf-Supervised Learning	CodeCode Available	2
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation	May 25, 2022	Representation LearningRhythm	CodeCode Available	1
PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit	May 20, 2022	AllAutomatic Speech Recognition (ASR)	CodeCode Available	6
End-to-End Zero-Shot Voice Conversion with Location-Variable Convolutions	May 19, 2022	Speech SynthesisStyle Transfer	CodeCode Available	1
SDS-200: A Swiss German Speech to Standard German Text Corpus	May 19, 2022	Speech SynthesisTranslation	CodeCode Available	0
Macedonian Speech Synthesis for Assistive Technology Applications	May 18, 2022	Deep LearningPitch control	—Unverified	0
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech	May 15, 2022	Speech SynthesisStyle Transfer	CodeCode Available	2
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model	May 11, 2022	Packet Loss ConcealmentSpeech Enhancement	CodeCode Available	3
Read the Room: Adapting a Robot's Voice to Ambient and Social Contexts	May 10, 2022	Speech SynthesisVoice Conversion	CodeCode Available	0
Deep Learning Enabled Semantic Communications with Speech Recognition and Synthesis	May 9, 2022	Deep LearningSemantic Communication	CodeCode Available	1
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	May 9, 2022	SentenceSpeech Synthesis	CodeCode Available	2
ReCAB-VAE: Gumbel-Softmax Variational Inference Based on Analytic Divergence	May 9, 2022	Speech Synthesistext-to-speech	—Unverified	0
SVTS: Scalable Video-to-Speech Synthesis	May 4, 2022	Speech Synthesis	CodeCode Available	1
Attentive activation function for improving end-to-end spoofing countermeasure systems	May 3, 2022	Speech SynthesisVoice Conversion	—Unverified	0
Requirements and Motivations of Low-Resource Speech Synthesis for Language Revitalization	May 1, 2022	Speech Synthesis	CodeCode Available	1
Systematic Inequalities in Language Technology Performance across the World’s Languages	May 1, 2022	Dependency ParsingMachine Translation	CodeCode Available	0
Improving Self-Supervised Learning-based MOS Prediction Networks	Apr 23, 2022	PredictionQuantization	CodeCode Available	0
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis	Apr 21, 2022	DenoisingGPU	CodeCode Available	2
A Survey on Non-Autoregressive Generation for Neural Machine Translation and Beyond	Apr 20, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Exploration strategies for articulatory synthesis of complex syllable onsets	Apr 20, 2022	Speech Synthesis	CodeCode Available	0
A Post Auto-regressive GAN Vocoder Focused on Spectrum Fracture	Apr 12, 2022	Speech Synthesis	—Unverified	0
Fine-grained Noise Control for Multispeaker Speech Synthesis	Apr 11, 2022	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
The PartialSpoof Database and Countermeasures for the Detection of Short Fake Speech Segments Embedded in an Utterance	Apr 11, 2022	Speaker VerificationSpeech Synthesis	—Unverified	0
Unsupervised Quantized Prosody Representation for Controllable Speech Synthesis	Apr 7, 2022	QuantizationSpeech Synthesis	—Unverified	0
MAESTRO: Matched Speech Text Representations through Modality Matching	Apr 7, 2022	Language ModellingSelf-Supervised Learning	—Unverified	0
Self-supervised learning for robust voice cloning	Apr 7, 2022	Self-Supervised LearningSpeech Synthesis	—Unverified	0
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores	Apr 7, 2022	Self-Supervised LearningSpeech Synthesis	—Unverified	0
SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis	Apr 6, 2022	Speech Synthesistext-to-speech	—Unverified	0
Simple and Effective Unsupervised Speech Synthesis	Apr 6, 2022	speech-recognitionSpeech Recognition	—Unverified	0
A Comparison of Deep Learning MOS Predictors for Speech Synthesis Quality	Apr 5, 2022	BenchmarkingSelf-Supervised Learning	—Unverified	0
Lip to Speech Synthesis with Visual Context Attentional GAN	Apr 4, 2022	Contrastive LearningGenerative Adversarial Network	CodeCode Available	1
VQTTS: High-Fidelity Text-to-Speech Synthesis with Self-Supervised VQ Acoustic Feature	Apr 2, 2022	Speech Synthesistext-to-speech	—Unverified	0
Universal Adaptor: Converting Mel-Spectrograms Between Different Configurations for Speech Synthesis	Apr 1, 2022	Speech SynthesisVoice Conversion	CodeCode Available	0
Residual-guided Personalized Speech Synthesis based on Face Image	Apr 1, 2022	Speech Synthesis	—Unverified	0
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios	Apr 1, 2022	Speech Synthesistext-to-speech	—Unverified	0
WavThruVec: Latent speech representation as intermediate features for neural speech synthesis	Mar 31, 2022	Speech Synthesistext-to-speech	—Unverified	0
ASR data augmentation in low-resource settings using cross-lingual multi-speaker TTS and cross-lingual voice conversion	Mar 29, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Applying Syntaxx2013Prosody Mapping Hypothesis and Prosodic Well-Formedness Constraints to Neural Sequence-to-Sequence Speech Synthesis	Mar 29, 2022	Speech Synthesistext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 12 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified