Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 1249 papers

Title	Date	Tasks	Status
Deep Performer: Score-to-Audio Music Performance Synthesis	Feb 12, 2022	DecoderSpeech Synthesis	—Unverified
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling	Aug 9, 2020	Deep LearningSpeech Synthesis	—Unverified
A Unified Transformer-based Framework for Duplex Text Normalization	Aug 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts	Apr 8, 2019	Binary ClassificationDeep Learning	—Unverified
A Unified Speaker Adaptation Method for Speech Synthesis using Transcribed and Untranscribed Speech with Backpropagation	Jun 18, 2019	DecoderSpeech Synthesis	—Unverified
Analysis of Voice Conversion and Code-Switching Synthesis Using VQ-VAE	Mar 28, 2022	Speech SynthesisVoice Conversion	—Unverified
Accented Text-to-Speech Synthesis with Limited Data	May 8, 2023	Speech Synthesistext-to-speech	—Unverified
iSTFTNet2: Faster and More Lightweight iSTFT-Based Neural Vocoder Using 1D-2D CNN	Aug 14, 2023	Speech Synthesis	—Unverified
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified
Analysis of artifacts in EEG signals for building BCIs	Sep 18, 2020	Brain Computer InterfaceDynamic Time Warping	—Unverified
Deep Feed-forward Sequential Memory Networks for Speech Synthesis	Feb 26, 2018	speech-recognitionSpeech Recognition	—Unverified
A unified lexical processing framework based on the Margin Infused Relaxed Algorithm. A case study on the Romanian Language	Sep 1, 2013	LemmatizationSpeech Synthesis	—Unverified
A Data-Driven Investigation of Noise-Adaptive Utterance Generation with Linguistic Modification	Oct 19, 2022	Speech SynthesisText Generation	—Unverified
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios	Apr 1, 2022	Speech Synthesistext-to-speech	—Unverified
Deep Denoising Auto-encoder for Statistical Speech Synthesis	Jun 17, 2015	DenoisingSpeech Synthesis	—Unverified
DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding	Oct 13, 2021	Speech SynthesisVoice Conversion	—Unverified
A unified front-end framework for English text-to-speech synthesis	May 18, 2023	Speech SynthesisText Normalization	—Unverified
Decoupled Pronunciation and Prosody Modeling in Meta-Learning-Based Multilingual Speech Synthesis	Sep 14, 2022	DecoderMeta-Learning	—Unverified
Debatts: Zero-Shot Debating Text-to-Speech Synthesis	Nov 10, 2024	Speech Synthesistext-to-speech	—Unverified
A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages	Oct 18, 2024	Speech Synthesistext-to-speech	—Unverified
Analysis and Synthesis of Hypo and Hyperarticulated Speech	Jun 7, 2020	Speech Synthesis	—Unverified
Investigating gated recurrent neural networks for speech synthesis	Jan 11, 2016	Speech Synthesis	—Unverified
Augmenting Polish Automatic Speech Recognition System With Synthetic Data	Oct 30, 2024	Automatic Speech Recognitionspeech-recognition	—Unverified
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech	Oct 17, 2024	DisentanglementQuantization	—Unverified
Analysing Shortcomings of Statistical Parametric Speech Synthesis	Jul 28, 2018	Speech Synthesis	—Unverified
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis	Jun 15, 2021	Speech Synthesistext-to-speech	—Unverified
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis	Mar 14, 2023	Prosody PredictionSpeech Synthesis	—Unverified
Cross-Utterance Conditioned VAE for Speech Generation	Sep 8, 2023	Speech Synthesistext-to-speech	—Unverified
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Accent conversion using discrete units with parallel data synthesized from controllable accented TTS	Sep 30, 2024	Data AugmentationSpeech Synthesis	—Unverified
Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network	Jan 31, 2020	QuantizationSpeech Synthesis	—Unverified
Improving homograph disambiguation with supervised machine learning	May 1, 2018	BIG-bench Machine LearningSpeech Synthesis	—Unverified
CrossSpeech: Speaker-independent Acoustic Representation for Cross-lingual Speech Synthesis	Feb 28, 2023	Speech Synthesistext-to-speech	—Unverified
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation	Dec 28, 2024	Speech Synthesis	—Unverified
Improving Prosody for Unseen Texts in Speech Synthesis by Utilizing Linguistic Information and Noisy Data	Nov 15, 2021	Chinese Word SegmentationMulti-Task Learning	—Unverified
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis	Nov 6, 2020	DecoderSentence	—Unverified
Improving Robustness of Diffusion-Based Zero-Shot Speech Synthesis via Stable Formant Generation	Sep 14, 2024	Speech Synthesistext-to-speech	—Unverified
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment	Jun 25, 2024	DecoderLanguage Modeling	—Unverified
DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores	Apr 7, 2022	Self-Supervised LearningSpeech Synthesis	—Unverified
Improving speech synthesis quality by reducing pitch peaks in the source recordings	Jun 1, 2013	Speech Synthesis	—Unverified
Improving Trajectory Modelling for DNN-based Speech Synthesis by using Stacked Bottleneck Features and Minimum Generation Error Training	Feb 22, 2016	Speech Synthesis	—Unverified
Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system	Oct 1, 2021	Speaker VerificationSpeech Synthesis	—Unverified
Incremental Coordination: Attention-Centric Speech Production in a Physically Situated Conversational Agent	Sep 1, 2015	Speech Synthesis	—Unverified
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis	Dec 22, 2024	DecoderDisentanglement	—Unverified
Incremental FastPitch: Chunk-based High Quality Text to Speech	Jan 3, 2024	Speech Synthesistext-to-speech	—Unverified
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time	Nov 4, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
From Start to Finish: Latency Reduction Strategies for Incremental Speech Synthesis in Simultaneous Speech-to-Speech Translation	Oct 15, 2021	Data AugmentationSimultaneous Speech-to-Speech Translation	—Unverified
Incremental Text-to-Speech Synthesis with Prefix-to-Prefix Framework	Nov 7, 2019	SentenceSpeech Synthesis	—Unverified
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis	Jul 30, 2018	Acoustic ModellingDecoder	—Unverified
Improving Cross-lingual Speech Synthesis with Triplet Training Scheme	Feb 22, 2022	Speech Synthesistext-to-speech	—Unverified

Show:10 25 50

← PrevPage 12 of 25Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified