Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 1419 papers

Title	Date	Tasks	Status	Hype
PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS	Feb 24, 2023	Decodertext-to-speech	CodeCode Available	2
Nix-TTS: Lightweight and End-to-End Text-to-Speech via Module-wise Distillation	Mar 29, 2022	CPUDecoder	CodeCode Available	2
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech	Feb 8, 2023	Code GenerationDiversity	CodeCode Available	2
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality	May 9, 2022	SentenceSpeech Synthesis	CodeCode Available	2
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers	Apr 18, 2023	In-Context LearningSpeech Synthesis	CodeCode Available	2
Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment	May 26, 2025	text-to-speechText to Speech	CodeCode Available	2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching	Jun 20, 2025	SchedulingSpeech Synthesis	CodeCode Available	2
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation	Jun 6, 2021	text-to-speechText to Speech	CodeCode Available	1
MathReader : Text-to-Speech for Mathematical Documents	Jan 13, 2025	Optical Character Recognition (OCR)text-to-speech	CodeCode Available	1
Miipher: A Robust Speech Restoration Model Integrating Self-Supervised Speech and Text Representations	Mar 3, 2023	Speech DenoisingSpeech Enhancement	CodeCode Available	1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation	Sep 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training	Mar 31, 2021	text-to-speechText to Speech	CodeCode Available	1
Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation	May 18, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining	Jan 30, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts	Feb 8, 2025	BenchmarkingSelf-Supervised Learning	CodeCode Available	1
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech	Nov 24, 2023	Dimensionality ReductionEmotion Classification	CodeCode Available	1
Learning to Dub Movies via Hierarchical Prosody Models	Dec 8, 2022	text-to-speechText to Speech	CodeCode Available	1
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset	Apr 17, 2021	Speech Synthesistext-to-speech	CodeCode Available	1
JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech	Mar 31, 2022	text-to-speechText to Speech	CodeCode Available	1
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis	Apr 1, 2024	Speech Synthesistext-to-speech	CodeCode Available	1
ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus	Jul 29, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features	Aug 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation	Aug 3, 2023	DecoderQuantization	CodeCode Available	1
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech	Jul 17, 2024	Speech-to-Speech Translationtext-to-speech	CodeCode Available	1
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search	Feb 8, 2021	CPUModel Compression	CodeCode Available	1
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech	Nov 7, 2021	Meta-LearningSpeech Synthesis	CodeCode Available	1
Mitigating Unauthorized Speech Synthesis for Voice Protection	Oct 28, 2024	Data AugmentationFace Swapping	CodeCode Available	1
Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation	Jul 30, 2023	text-to-speechText to Speech	CodeCode Available	1
Improving fairness for spoken language understanding in atypical speech with Text-to-Speech	Nov 16, 2023	Data AugmentationFairness	CodeCode Available	1
Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech	Feb 27, 2023	Speech Synthesistext-to-speech	CodeCode Available	1
IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation	Nov 1, 2020	Dynamic Time WarpingMachine Translation	CodeCode Available	1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning	Nov 7, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods	Sep 15, 2023	Audio Deepfake DetectionDeepFake Detection	CodeCode Available	1
HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation	Oct 23, 2022	Generative Adversarial NetworkSinging Voice Synthesis	CodeCode Available	1
HUI-Audio-Corpus-German: A high quality TTS dataset	Jun 11, 2021	Text Normalizationtext-to-speech	CodeCode Available	1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions	Jun 10, 2025	text-to-speechText to Speech	CodeCode Available	1
A Character-level Span-based Model for Mandarin Prosodic Structure Prediction	Mar 31, 2022	Sentencetext-to-speech	CodeCode Available	1
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks	Apr 6, 2024	Domain AdaptationSpeech Synthesis	CodeCode Available	1
In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data	Apr 4, 2019	Speech Synthesistext-to-speech	CodeCode Available	1
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search	May 22, 2020	text-to-speechText to Speech	CodeCode Available	1
g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset	Apr 7, 2020	Grapheme-to-Phoneme ConversionPolyphone disambiguation	CodeCode Available	1
Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview	Oct 14, 2020	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint	May 10, 2020	Speaker VerificationSpeech Synthesis	CodeCode Available	1
FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection	Oct 18, 2021	Speech SynthesisSynthetic Speech Detection	CodeCode Available	1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition	May 22, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding	Aug 12, 2020	Speech Synthesistext-to-speech	CodeCode Available	1
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning	Jun 15, 2022	AttributeEmotion Classification	CodeCode Available	1
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech	May 13, 2021	DecoderSpeech Synthesis	CodeCode Available	1
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings	Oct 7, 2021	Language ModelingLanguage Modelling	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 29Next →

No leaderboard results yet.