Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 1419 papers

Title	Date	Tasks	Status	Hype
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism	May 6, 2021	Generative Adversarial NetworkSinging Voice Synthesis	CodeCode Available	2
Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram	Oct 25, 2019	Generative Adversarial NetworkGPU	CodeCode Available	2
FastSpeech: Fast,Robustand Controllable Text-to-Speech	May 22, 2019	Decodertext-to-speech	CodeCode Available	2
FastSpeech: Fast, Robust and Controllable Text to Speech	May 22, 2019	DecoderSpeech Synthesis	CodeCode Available	2
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction	Oct 28, 2018	PredictionSpeech Synthesis	CodeCode Available	2
Neural Speech Synthesis with Transformer Network	Sep 19, 2018	DecoderMachine Translation	CodeCode Available	2
Efficient Neural Audio Synthesis	Feb 23, 2018	Audio SynthesisCPU	CodeCode Available	2
InstructTTSEval: Benchmarking Complex Natural-Language Instruction Following in Text-to-Speech Systems	Jun 19, 2025	BenchmarkingDescriptive	CodeCode Available	1
GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions	Jun 10, 2025	text-to-speechText to Speech	CodeCode Available	1
UniTTS: An end-to-end TTS system without decoupling of acoustic and semantic information	May 23, 2025	Large Language ModelQuantization	CodeCode Available	1
From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition	May 22, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models	May 21, 2025	Bayesian OptimizationSpeech Synthesis	CodeCode Available	1
ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts	Feb 8, 2025	BenchmarkingSelf-Supervised Learning	CodeCode Available	1
Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet	Feb 4, 2025	Speech Synthesistext-to-speech	CodeCode Available	1
MathReader : Text-to-Speech for Mathematical Documents	Jan 13, 2025	Optical Character Recognition (OCR)text-to-speech	CodeCode Available	1
Mitigating Unauthorized Speech Synthesis for Voice Protection	Oct 28, 2024	Data AugmentationFace Swapping	CodeCode Available	1
STTATTS: Unified Speech-To-Text And Text-To-Speech Model	Oct 24, 2024	Multi-Task Learningspeech-recognition	CodeCode Available	1
Where are we in audio deepfake detection? A systematic analysis over generative and detection models	Oct 6, 2024	Audio Deepfake DetectionAudio Synthesis	CodeCode Available	1
LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation	Sep 23, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
PRESENT: Zero-Shot Text-to-Prosody Control	Aug 13, 2024	Prosody PredictionSpeech Synthesis	CodeCode Available	1
ALIF: Low-Cost Adversarial Audio Attacks on Black-Box Speech Platforms using Linguistic Features	Aug 3, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech	Jul 17, 2024	Speech-to-Speech Translationtext-to-speech	CodeCode Available	1
E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS	Jun 26, 2024	text-to-speechText to Speech	CodeCode Available	1
TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers	Jun 22, 2024	DecoderLanguage Modeling	CodeCode Available	1
AudioMarkBench: Benchmarking Robustness of Audio Watermarking	Jun 11, 2024	Benchmarkingtext-to-speech	CodeCode Available	1
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model	Jun 7, 2024	text-to-speechText to Speech	CodeCode Available	1
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts	Apr 29, 2024	Contrastive LearningSpeech Synthesis	CodeCode Available	1
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach	Apr 28, 2024	Decodertext-to-speech	CodeCode Available	1
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks	Apr 6, 2024	Domain AdaptationSpeech Synthesis	CodeCode Available	1
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis	Apr 1, 2024	Speech Synthesistext-to-speech	CodeCode Available	1
Brilla AI: AI Contestant for the National Science and Maths Quiz	Mar 4, 2024	MathQuestion Answering	CodeCode Available	1
Benchmarking Large Multimodal Models against Common Corruptions	Jan 22, 2024	BenchmarkingImage to text	CodeCode Available	1
Multi-Task Learning for Front-End Text Processing in TTS	Jan 12, 2024	Language ModelingLanguage Modelling	CodeCode Available	1
Neural Text to Articulate Talk: Deep Text to Audiovisual Speech Synthesis achieving both Auditory and Photo-realism	Dec 11, 2023	Face GenerationLip Reading	CodeCode Available	1
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech	Nov 24, 2023	Dimensionality ReductionEmotion Classification	CodeCode Available	1
Improving fairness for spoken language understanding in atypical speech with Text-to-Speech	Nov 16, 2023	Data AugmentationFairness	CodeCode Available	1
Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning	Nov 7, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
ArTST: Arabic Text and Speech Transformer	Oct 25, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	1
Crowdsourced and Automatic Speech Prominence Estimation	Oct 12, 2023	Emotion Recognitiontext-to-speech	CodeCode Available	1
Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech	Oct 1, 2023	speech-recognitionSpeech Recognition	CodeCode Available	1
BiSinger: Bilingual Singing Voice Synthesis	Sep 25, 2023	Singing Voice Synthesistext-to-speech	CodeCode Available	1
Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech	Sep 21, 2023	text-to-speechText to Speech	CodeCode Available	1
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model	Sep 20, 2023	ChatbotLanguage Modeling	CodeCode Available	1
HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods	Sep 15, 2023	Audio Deepfake DetectionDeepFake Detection	CodeCode Available	1
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP	Sep 11, 2023	text-to-speechText to Speech	CodeCode Available	1
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning	Aug 31, 2023	Representation LearningSpeech Representation Learning	CodeCode Available	1
TextrolSpeech: A Text Style Control Speech Corpus With Codec Language Text-to-Speech Models	Aug 28, 2023	Language Modellingtext-to-speech	CodeCode Available	1
Towards an AI to Win Ghana's National Science and Maths Quiz	Aug 8, 2023	MathQuestion Answering	CodeCode Available	1
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation	Aug 3, 2023	DecoderQuantization	CodeCode Available	1
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training	Jul 31, 2023	DenoisingExpressive Speech Synthesis	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 29Next →

No leaderboard results yet.