Text-To-Speech Synthesis

Text-To-Speech Synthesis is a machine learning task that involves converting written text into spoken words. The goal is to generate synthetic speech that sounds natural and resembles human speech as closely as possible.

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 301–332 of 332 papers

Title	Date	Tasks	Status
Designing Language Technology Applications: A Wizard of Oz Driven Prototyping Framework	Apr 1, 2014	Machine TranslationSpeech Recognition	—Unverified
Development of Marathi Part of Speech Tagger Using Statistical Approach	Oct 2, 2013	Information RetrievalPart-Of-Speech Tagging	—Unverified
AutoTTS: End-to-End Text-to-Speech Synthesis through Differentiable Duration Modeling	Mar 21, 2022	DecoderSpeech Synthesis	—Unverified
Automatic Syllabification for Manipuri language	Dec 1, 2016	Automatic Speech Recognition (ASR)Segmentation	—Unverified
DNN-based Speech Synthesis for Indian Languages from ASCII text	Aug 18, 2016	Speech Synthesistext-to-speech	—Unverified
Dual Script E2E framework for Multilingual and Code-Switching ASR	Jun 2, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Do Prosody Transfer Models Transfer Prosody?	Mar 7, 2023	Speech Synthesistext-to-speech	—Unverified
Duration Modeling by Multi-Models based on Vowel Production characteristics	Dec 1, 2014	Speech SynthesisText-To-Speech Synthesis	—Unverified
DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis	Oct 17, 2024	Speech Synthesistext-to-speech	—Unverified
DurIAN-E: Duration Informed Attention Network For Expressive Text-to-Speech Synthesis	Sep 22, 2023	DenoisingSpeech Synthesis	—Unverified
Towards Multilingual Conversations in the Medical Domain: Development of Multilingual Medical Data and A Network-based ASR System	May 1, 2014	Machine Translationspeech-recognition	—Unverified
Effect of choice of probability distribution, randomness, and search methods for alignment modeling in sequence-to-sequence text-to-speech synthesis using hard alignment	Oct 28, 2019	Hard AttentionSpeech Synthesis	—Unverified
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens	Dec 13, 2024	Conditional Image GenerationImage Generation	—Unverified
Towards Spontaneous Style Modeling with Semi-supervised Pre-training for Conversational Text-to-Speech Synthesis	Aug 31, 2023	Expressive Speech SynthesisSentence	—Unverified
Automatic Arabic Dialect Identification Systems for Written Texts: A Survey	Sep 26, 2020	Dialect IdentificationMachine Translation	—Unverified
Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch	Oct 9, 2024	Speech Synthesistext-to-speech	—Unverified
Automatically Acquiring Fine-Grained Information Status Distinctions in German	Jul 1, 2012	Coreference ResolutionSpeech Synthesis	—Unverified
Emphasized Accent Phrase Prediction from Text for Advertisement Text-To-Speech Synthesis	Dec 1, 2014	Speech Synthesistext-to-speech	—Unverified
End-to-End Feedback Loss in Speech Chain Framework via Straight-Through Estimator	Oct 31, 2018	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE	Oct 19, 2020	Speech Synthesistext-to-speech	—Unverified
Transformer-based Models of Text Normalization for Speech Applications	Feb 1, 2022	SentenceSpeech Synthesis	—Unverified
A Unified Transformer-based Framework for Duplex Text Normalization	Aug 23, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback	Jun 2, 2024	Speech Synthesistext-to-speech	—Unverified
Environment Aware Text-to-Speech Synthesis	Oct 8, 2021	AttributeDisentanglement	—Unverified
EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models	Sep 22, 2022	Speech Synthesistext-to-speech	—Unverified
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis	Nov 11, 2019	Polyphone disambiguationSpeech Synthesis	—Unverified
Evaluating Long-form Text-to-Speech: Comparing the Ratings of Sentences and Paragraphs	Sep 9, 2019	FormSpeech Synthesis	—Unverified
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model	May 16, 2024	HallucinationLanguage Modeling	—Unverified
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet	Jan 30, 2021	CPUSentence	—Unverified
Exploring Transfer Learning for Urdu Speech Synthesis	Jun 1, 2022	Speech Synthesistext-to-speech	—Unverified
Fast Bootstrapping of Grapheme to Phoneme System for Under-resourced Languages - Application to the Iban Language	Oct 1, 2013	Speech RecognitionSpeech Synthesis	—Unverified
A unified front-end framework for English text-to-speech synthesis	May 18, 2023	Speech SynthesisText Normalization	—Unverified

Show:10 25 50

← PrevPage 7 of 7Next →

All datasets LJSpeech 20000 utterances CMUDict 0.7b HUI speech corpus Thorsten voice 21.02 neutral Trinity Speech-Gesture Dataset

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	NaturalSpeech	Audio Quality MOS	4.56	—	Unverified
2	VITS	Audio Quality MOS	4.43	—	Unverified
3	Grad-TTS + HiFiGAN (1000 steps)	Audio Quality MOS	4.37	—	Unverified
4	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
5	Glow-TTS + HiFiGAN	Audio Quality MOS	4.34	—	Unverified
6	FastSpeech 2 + HiFiGAN	Audio Quality MOS	4.32	—	Unverified
7	FastDiff (4 steps)	Audio Quality MOS	4.28	—	Unverified
8	FastDiff-TTS	Audio Quality MOS	4.03	—	Unverified
9	Transformer TTS (Mel + WaveGlow)	Audio Quality MOS	3.88	—	Unverified
10	FastSpeech (Mel + WaveGlow)	Audio Quality MOS	3.84	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Mia	10-keyword Speech Commands dataset	16	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Token-Level Ensemble Distillation	Phoneme Error Rate	4.6	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.74	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Tacotron 2	Mean Opinion Score	3.49	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Match-TTSG	MOS	3.7	—	Unverified