Speech Synthesis

Speech synthesis is the task of generating speech from some other modality like text, lip movements etc.

Please note that the leaderboards here are not really comparable between studies - as they use mean opinion score as a metric and collect different samples from Amazon Mechnical Turk.

( Image credit: WaveNet: A generative model for raw audio )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1101–1125 of 1249 papers

Title	Date	Tasks	Status	Hype
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model	Dec 22, 2016	Audio GenerationSpeech Synthesis	CodeCode Available	0
基於字元階層之語音合成用文脈訊息擷取 (Character-Level Linguistic Features Extraction for Text-to-Speech System) [In Chinese]	Dec 1, 2016	Feature EngineeringSpeech Synthesis	—Unverified	0
Weakly-supervised text-to-speech alignment confidence measure	Dec 1, 2016	speech-recognitionSpeech Recognition	—Unverified	0
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified	0
papago: A Machine Translation Service with Word Sense Disambiguation and Currency Conversion	Dec 1, 2016	Machine TranslationOptical Character Recognition (OCR)	—Unverified	0
Continuous Expressive Speaking Styles Synthesis based on CVSM and MR-HMM	Dec 1, 2016	Expressive Speech SynthesisSpeech Recognition	—Unverified	0
An Overview of BPPT's Indonesian Language Resources	Dec 1, 2016	Machine Translationspeech-recognition	—Unverified	0
Combining Human Inputters and Language Services to provide Multi-language support system for International Symposiums	Dec 1, 2016	Automatic Speech Recognition (ASR)Machine Translation	—Unverified	0
Large-scale Analysis of Spoken Free-verse Poetry	Dec 1, 2016	Speech Synthesis	—Unverified	0
A Survey of Voice Translation Methodologies - Acoustic Dialect Decoder	Oct 13, 2016	DecoderSentence	—Unverified	0
Dictionary Update for NMF-based Voice Conversion Using an Encoder-Decoder Network	Oct 13, 2016	DecoderSpeech Enhancement	—Unverified	0
WaveNet: A Generative Model for Raw Audio	Sep 12, 2016	Audio Generationmodel	CodeCode Available	1
Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach	Aug 22, 2016	Speech Synthesis	—Unverified	0
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified	0
OpenDial: A Toolkit for Developing Spoken Dialogue Systems with Probabilistic Rules	Aug 1, 2016	Dialogue ManagementSpeech Recognition	—Unverified	0
De l'utilisation de descripteurs issus de la linguistique computationnelle dans le cadre de la synth\`ese par HMM (Toward the use of information density based descriptive features in HMM based speech synthesis)	Jul 1, 2016	DescriptiveSENTER	—Unverified	0
Adaptation de la prononciation pour la synth\`ese de la parole spontan\'ee en utilisant des informations linguistiques (Pronunciation adaptation for spontaneous speech synthesis using linguistic information)	Jul 1, 2016	Speech Synthesis	—Unverified	0
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices	Jun 20, 2016	QuantizationSpeech Synthesis	—Unverified	0
Statistical Parametric Speech Synthesis Using Bottleneck Representation From Sequence Auto-encoder	Jun 19, 2016	Speech Synthesis	—Unverified	0
Design and development a children's speech database	May 25, 2016	speech-recognitionSpeech Recognition	—Unverified	0
Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis	May 1, 2016	Expressive Speech SynthesisSpeech Synthesis	—Unverified	0
CHATR the Corpus; a 20-year-old archive of Concatenative Speech Synthesis	May 1, 2016	Speech Synthesis	—Unverified	0
Speech Synthesis of Code-Mixed Text	May 1, 2016	Language IdentificationSpeech Synthesis	—Unverified	0
Phonetic Inventory for an Arabic Speech Corpus	May 1, 2016	Speech Synthesis	—Unverified	0
A Taxonomy of Specific Problem Classes in Text-to-Speech Synthesis: Comparing Commercial and Open Source Performance	May 1, 2016	Speech Synthesistext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 45 of 50Next →

All datasets LibriTTS North American English LJSpeech Mandarin Chinese Blizzard Challenge 2013

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	PeriodWave-Turbo-L	PESQ	4.45	—	Unverified
2	BigVGAN-v2	PESQ	4.36	—	Unverified
3	EVA-GAN-big	PESQ	4.35	—	Unverified
4	PeriodWave + FreeU	PESQ	4.25	—	Unverified
5	RFWave	PESQ	4.23	—	Unverified
6	BigVSAN (w/ snakebeta)	PESQ	4.12	—	Unverified
7	BigVSAN	PESQ	4.12	—	Unverified
8	EVA-GAN-base	PESQ	4.03	—	Unverified
9	BigVGAN	PESQ	4.03	—	Unverified
10	Vocos	PESQ	3.7	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	4.53	—	Unverified
2	WaveNet (Linguistic)	Mean Opinion Score	4.34	—	Unverified
3	WaveNet (L+F)	Mean Opinion Score	4.21	—	Unverified
4	Tacotron	Mean Opinion Score	4	—	Unverified
5	HMM-driven concatenative	Mean Opinion Score	3.86	—	Unverified
6	LSTM-RNN parametric	Mean Opinion Score	3.67	—	Unverified
7	means	Mean Opinion Score	0	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	BDDM vocoder	Mean Opinion Score	4.48	—	Unverified
2	DiffWave LARGE	Mean Opinion Score	4.44	—	Unverified
3	Neural HMM	Mean Opinion Score	3.24	—	Unverified
4	Neural HMM Ablation with 1 state per phone	Mean Opinion Score	2.68	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	WaveNet (L+F)	Mean Opinion Score	4.08	—	Unverified
2	LSTM-RNN parametric	Mean Opinion Score	3.79	—	Unverified
3	HMM-driven concatenative	Mean Opinion Score	3.47	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	SampleRNN (2-tier)	NLL	1.39	—	Unverified
2	SampleRNN (3-tier)	NLL	1.39	—	Unverified