Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 401–450 of 1419 papers

Title	Date	Tasks	Status	Hype
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text	May 16, 2024	Code GenerationFace Generation	—Unverified	0
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer	May 15, 2024	Adversarial AttackAutomatic Speech Recognition	—Unverified	0
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset	May 14, 2024	DeepFake DetectionFace Swapping	CodeCode Available	0
Real-Time Pill Identification for the Visually Impaired Using Deep Learning	May 8, 2024	Deep LearningManagement	—Unverified	0
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech	Apr 30, 2024	Decodertext-to-speech	—Unverified	0
UMETTS: A Unified Framework for Emotional Text-to-Speech Synthesis with Multimodal Prompts	Apr 29, 2024	Contrastive LearningSpeech Synthesis	CodeCode Available	1
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach	Apr 28, 2024	Decodertext-to-speech	CodeCode Available	1
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality	Apr 27, 2024	Imputationtext-to-speech	—Unverified	0
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations	Apr 23, 2024	text-to-speechText to Speech	—Unverified	0
Retrieval-Augmented Audio Deepfake Detection	Apr 22, 2024	Audio Deepfake DetectionDeepFake Detection	—Unverified	0
Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling	Apr 14, 2024	Polyphone disambiguationText Normalization	—Unverified	0
Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network	Apr 11, 2024	Autonomous Vehiclestext-to-speech	—Unverified	0
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations	Apr 10, 2024	Dialogue Generationtext-to-speech	CodeCode Available	2
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness	Apr 10, 2024	Speech Synthesistext-to-speech	CodeCode Available	2
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge	Apr 9, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Cross-Domain Audio Deepfake Detection: Dataset and Analysis	Apr 7, 2024	Audio Deepfake DetectionDeepFake Detection	—Unverified	0
HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks	Apr 6, 2024	Domain AdaptationSpeech Synthesis	CodeCode Available	1
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis	Apr 4, 2024	Language ModelingLanguage Modelling	—Unverified	0
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech	Apr 3, 2024	Language ModelingLanguage Modelling	—Unverified	0
PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders	Apr 3, 2024	Representation LearningSpeaker Verification	—Unverified	0
KazEmoTTS: A Dataset for Kazakh Emotional Text-to-Speech Synthesis	Apr 1, 2024	Speech Synthesistext-to-speech	CodeCode Available	1
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models	Mar 31, 2024	DenoisingSpeech Synthesis	CodeCode Available	2
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency Generation	Mar 31, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
A Review of Multi-Modal Large Language and Vision Models	Mar 28, 2024	Image CaptioningPrompt Engineering	—Unverified	0
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild	Mar 25, 2024	DecoderLanguage Modeling	CodeCode Available	9
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning	Mar 20, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations	Mar 17, 2024	Attributetext-to-speech	—Unverified	0
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech	Mar 13, 2024	GPUSpeech Synthesis	—Unverified	0
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation	Mar 7, 2024	DiversityMachine Translation	—Unverified	0
AttentionStitch: How Attention Solves the Speech Editing Problem	Mar 5, 2024	text-to-speechText to Speech	—Unverified	0
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models	Mar 5, 2024	QuantizationSpeech Synthesis	CodeCode Available	3
Brilla AI: AI Contestant for the National Science and Maths Quiz	Mar 4, 2024	MathQuestion Answering	CodeCode Available	1
Towards Accurate Lip-to-Speech Synthesis in-the-Wild	Mar 2, 2024	Language ModellingLip to Speech Synthesis	—Unverified	0
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data	Feb 29, 2024	Representation LearningSpeech Synthesis	—Unverified	0
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation	Feb 26, 2024	Dataset Generationtext-to-speech	CodeCode Available	2
Efficient data selection employing Semantic Similarity-based Graph Structures for model training	Feb 22, 2024	Semantic SimilaritySemantic Textual Similarity	—Unverified	0
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition	Feb 22, 2024	text-to-speechText to Speech	—Unverified	0
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models	Feb 19, 2024	DenoisingImage Generation	—Unverified	0
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting	Feb 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru	Feb 18, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech	Feb 14, 2024	DecoderGPU	—Unverified	0
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data	Feb 12, 2024	DecoderDisentanglement	—Unverified	0
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like	Feb 12, 2024	text-to-speechText to Speech	—Unverified	0
A New Approach to Voice Authenticity	Feb 9, 2024	text-to-speechText to Speech	—Unverified	0
Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation	Feb 8, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	CodeCode Available	2
Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations	Feb 5, 2024	DecoderIn-Context Learning	—Unverified	0
Natural language guidance of high-fidelity text-to-speech with synthetic annotations	Feb 2, 2024	In-Context LearningLanguage Modeling	CodeCode Available	9
PAM: Prompting Audio-Language Models for Audio Quality Assessment	Feb 1, 2024	Audio Quality AssessmentMusic Generation	CodeCode Available	2
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech	Feb 1, 2024	text-to-speechText to Speech	—Unverified	0
MunTTS: A Text-to-Speech System for Mundari	Jan 28, 2024	Speech Synthesistext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 9 of 29Next →

No leaderboard results yet.