Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 451–500 of 1249 papers

Title	Date	Tasks	Status	Hype
On granularity of prosodic representations in expressive text-to-speech	Jan 26, 2023	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
Multilingual Multiaccented Multispeaker TTS with RADTTS	Jan 24, 2023	Speech Synthesis	—Unverified	0
Regeneration Learning: A Learning Paradigm for Data Generation	Jan 21, 2023	Image GenerationRepresentation Learning	—Unverified	0
Applying Automated Machine Translation to Educational Video Courses	Jan 9, 2023	Machine TranslationSpeech Synthesis	—Unverified	0
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers	Jan 5, 2023	In-Context LearningLanguage Modeling	CodeCode Available	7
Towards Voice Reconstruction from EEG during Imagined Speech	Jan 2, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
ReVISE: Self-Supervised Speech Resynthesis With Visual Input for Universal and Generalized Speech Regeneration	Jan 1, 2023	Audio-Visual Speech RecognitionResynthesis	—Unverified	0
HMM-based data augmentation for E2E systems for building conversational speech synthesis systems	Dec 22, 2022	Data AugmentationLanguage Modeling	—Unverified	0
ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement	Dec 21, 2022	Audio-Visual Speech RecognitionResynthesis	—Unverified	0
Investigation of Japanese PnG BERT language model in text-to-speech synthesis for pitch accent language	Dec 16, 2022	Language ModelingLanguage Modelling	—Unverified	0
Text-to-speech synthesis based on latent variable conversion using diffusion probabilistic model and variational autoencoder	Dec 16, 2022	Representation LearningSpeech Synthesis	—Unverified	0
RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis	Dec 15, 2022	RelationSpeech Synthesis	CodeCode Available	1
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis	Dec 13, 2022	Data AugmentationSpeech Synthesis	—Unverified	0
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset	Dec 11, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing	Nov 30, 2022	Machine TranslationSentence	—Unverified	0
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech	Nov 30, 2022	Speech Synthesistext-to-speech	—Unverified	0
Controllable speech synthesis by learning discrete phoneme-level prosodic representations	Nov 29, 2022	ClusteringSpeech Synthesis	—Unverified	0
Contextual Expressive Text-to-Speech	Nov 26, 2022	Speech Synthesistext-to-speech	—Unverified	0
Efficient Incremental Text-to-Speech on GPUs	Nov 25, 2022	GPUSpeech Synthesis	—Unverified	0
PromptTTS: Controllable Text-to-Speech with Text Descriptions	Nov 22, 2022	DecoderSpeech Synthesis	CodeCode Available	0
Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System	Nov 21, 2022	GPUSpeech Synthesis	CodeCode Available	1
LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders	Nov 20, 2022	Speech EnhancementSpeech Synthesis	—Unverified	0
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling	Nov 19, 2022	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
Audio Anti-spoofing Using a Simple Attention Module and Joint Optimization Based on Additive Angular Margin Loss and Meta-learning	Nov 17, 2022	Binary ClassificationMeta-Learning	—Unverified	0
Towards Building Text-To-Speech Systems for the Next Billion Users	Nov 17, 2022	DiversitySpeech Synthesis	CodeCode Available	2
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models	Nov 17, 2022	Speech Synthesistext-to-speech	—Unverified	0
The Potential of Neural Speech Synthesis-based Data Augmentation for Personalized Speech Enhancement	Nov 14, 2022	Data AugmentationSpeech Enhancement	—Unverified	0
OverFlow: Putting flows on top of neural transducers for better TTS	Nov 13, 2022	Normalising FlowsSpeech Synthesis	CodeCode Available	1
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations	Nov 11, 2022	Emotional Speech SynthesisSpeech Synthesis	—Unverified	0
PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping	Nov 8, 2022	Generative Adversarial NetworkSpeech Synthesis	CodeCode Available	1
ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech	Nov 7, 2022	Representation LearningSpeech Representation Learning	CodeCode Available	6
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder	Nov 7, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
Deliberation Networks and How to Train Them	Nov 6, 2022	Machine TranslationSpeech Synthesis	—Unverified	0
Self-Supervised Learning for Speech Enhancement through Synthesis	Nov 4, 2022	DenoisingSelf-Supervised Learning	CodeCode Available	0
SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing	Nov 4, 2022	DiversitySpeaker Verification	CodeCode Available	1
Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis	Nov 2, 2022	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
Taiwanese-Accented Mandarin and English Multi-Speaker Talking-Face Synthesis System	Nov 1, 2022	Face GenerationSpeech Synthesis	—Unverified	0
A Preliminary Study on Mandarin-Hakka neural machine translation using small-sized data	Nov 1, 2022	Machine TranslationSpeech Synthesis	—Unverified	0
Development of Mandarin-English code-switching speech synthesis system	Nov 1, 2022	SentenceSpeech Synthesis	—Unverified	0
Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages	Nov 1, 2022	ChunkingRhythm	—Unverified	0
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis	Nov 1, 2022	DisentanglementDiversity	—Unverified	0
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers	Nov 1, 2022	parameter-efficient fine-tuningSpeech Synthesis	—Unverified	0
Towards Developing State-of-the-Art TTS Synthesisers for 13 Indian Languages with Signal Processing aided Alignments	Oct 31, 2022	Speech Synthesis	—Unverified	0
Period VITS: Variational Inference with Explicit Pitch Modeling for End-to-end Emotional Speech Synthesis	Oct 28, 2022	DecoderDiversity	—Unverified	0
Evaluating context-invariance in unsupervised speech representations	Oct 27, 2022	Language Modellingspeech-recognition	CodeCode Available	0
FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis	Oct 27, 2022	Speech Synthesistext-to-speech	CodeCode Available	1
Articulation GAN: Unsupervised modeling of articulatory learning	Oct 27, 2022	Generative Adversarial NetworkSpeech Synthesis	CodeCode Available	1
A Fast and Accurate Pitch Estimation Algorithm Based on the Pseudo Wigner-Ville Distribution	Oct 27, 2022	Speech Synthesis	CodeCode Available	0
Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech	Oct 27, 2022	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
RedPen: Region- and Reason-Annotated Dataset of Unnatural Speech	Oct 26, 2022	Speech Synthesis	—Unverified	0

Show:10 25 50

← PrevPage 10 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified