Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 76–100 of 332 papers

Title	Date	Tasks	Status	Hype
Guided Flows for Generative Modeling and Decision Making	Nov 22, 2023	Conditional Image GenerationDecision Making	—Unverified	0
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning	Nov 7, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Generative Pre-training for Speech with Flow Matching	Oct 25, 2023	Speech EnhancementSpeech Synthesis	—Unverified	0
Back Transcription as a Method for Evaluating Robustness of Natural Language Understanding Models to Speech Recognition Errors	Oct 25, 2023	en-US domain classificationen-US Intent Classification	CodeCode Available	0
ArTST: Arabic Text and Speech Transformer	Oct 25, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Generative Adversarial Training for Text-to-Speech Synthesis Based on Raw Phonetic Input and Explicit Prosody Modelling	Oct 14, 2023	Speech Synthesistext-to-speech	CodeCode Available	2
Attentive Multi-Layer Perceptron for Non-autoregressive Generation	Oct 14, 2023	Machine TranslationSpeech Synthesis	CodeCode Available	0
Unified speech and gesture synthesis using flow matching	Oct 8, 2023	Audio SynthesisMotion Synthesis	—Unverified	0
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT	Oct 7, 2023	Audio captioningAutomatic Speech Recognition	CodeCode Available	2
The VoiceMOS Challenge 2023: Zero-shot Subjective Speech Quality Prediction for Multiple Domains	Oct 4, 2023	Speech Synthesistext-to-speech	—Unverified	0
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis	Sep 22, 2023	DenoisingSpeech Synthesis	—Unverified	0
FunCodec: A Fundamental, Reproducible and Integrable Open-source Toolkit for Neural Speech Codec	Sep 14, 2023	Automatic Speech Recognitionspeech-recognition	CodeCode Available	2
Matcha-TTS: A fast TTS architecture with conditional flow matching	Sep 6, 2023	Acoustic ModellingDecoder	CodeCode Available	3
The FruitShell French synthesis system at the Blizzard 2023 Challenge	Sep 1, 2023	Data AugmentationSpeech Synthesis	—Unverified	0
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning	Aug 31, 2023	Representation LearningSpeech Representation Learning	CodeCode Available	1
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis	Aug 31, 2023	Expressive Speech SynthesisSentence	—Unverified	0
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation	Aug 3, 2023	DecoderQuantization	CodeCode Available	1
SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis	Aug 2, 2023	DecoderSelf-Supervised Learning	—Unverified	0
Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech	Jul 31, 2023	Acoustic ModellingSpeech Synthesis	—Unverified	0
SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs	Jul 18, 2023	Generative Adversarial NetworkLanguage Modeling	—Unverified	0
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units	Jun 29, 2023	Speech Synthesistext-to-speech	—Unverified	0
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale	Jun 23, 2023	In-Context LearningSpeech Synthesis	CodeCode Available	0
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration	May 25, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models	May 23, 2023	Speech Synthesistext-to-speech	—Unverified	0
VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages	May 21, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0

Show:10 25 50

← PrevPage 4 of 14Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified