Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 351–400 of 1419 papers

Title	Date	Tasks	Status	Hype
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions	Jun 21, 2024	speech-recognitionSpeech Recognition	—Unverified	0
DASB -- Discrete Audio and Speech Benchmark	Jun 20, 2024	BenchmarkingEmotion Recognition	—Unverified	0
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models	Jun 18, 2024	Synthetic Data Generationtext-to-speech	—Unverified	0
DiTTo-TTS: Diffusion Transformers for Scalable Text-to-Speech without Domain-Specific Factors	Jun 17, 2024	text-to-speechText to Speech	CodeCode Available	2
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis	Jun 16, 2024	DisentanglementSpeech Synthesis	—Unverified	0
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice	Jun 14, 2024	text-to-speechText to Speech	—Unverified	0
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage	Jun 13, 2024	Sentencetext-to-speech	—Unverified	0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing	Jun 13, 2024	Language ModelingLanguage Modelling	—Unverified	0
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data	Jun 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech	Jun 12, 2024	text-to-speechText to Speech	—Unverified	0
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning	Jun 12, 2024	text-to-speechText to Speech	CodeCode Available	2
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment	Jun 12, 2024	QuantizationSpeech Synthesis	—Unverified	0
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech	Jun 12, 2024	Emotional Speech Synthesistext-to-speech	CodeCode Available	2
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?	Jun 11, 2024	Contrastive LearningSpeech Synthesis	—Unverified	0
AudioMarkBench: Benchmarking Robustness of Audio Watermarking	Jun 11, 2024	Benchmarkingtext-to-speech	CodeCode Available	1
Controlling Emotion in Text-to-Speech with Natural Language Prompts	Jun 10, 2024	text-to-speechText to Speech	—Unverified	0
Meta Learning Text-to-Speech Synthesis in over 7000 Languages	Jun 10, 2024	Meta-LearningSpeech Synthesis	—Unverified	0
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance	Jun 10, 2024	Singing Voice Synthesistext-to-speech	—Unverified	0
Text-aware and Context-aware Expressive Audiobook Speech Synthesis	Jun 9, 2024	Contrastive LearningLanguage Modeling	—Unverified	0
WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark	Jun 9, 2024	text-to-speechText to Speech	CodeCode Available	2
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS	Jun 9, 2024	DenoisingSpeech Denoising	—Unverified	0
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers	Jun 8, 2024	Speech Synthesistext-to-speech	—Unverified	0
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis	Jun 8, 2024	Audio GenerationDecoder	—Unverified	0
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study	Jun 7, 2024	DiversityLanguage Modeling	—Unverified	0
Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs	Jun 7, 2024	QuantizationSpeech Synthesis	—Unverified	0
XTTS: a Massively Multilingual Zero-Shot Text-to-Speech Model	Jun 7, 2024	text-to-speechText to Speech	CodeCode Available	1
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer	Jun 6, 2024	text-to-speechText to Speech	—Unverified	0
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis	Jun 6, 2024	DecoderInductive Bias	CodeCode Available	2
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems	Jun 6, 2024	Diversitytext-to-speech	—Unverified	0
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model	Jun 6, 2024	Language ModelingLanguage Modelling	—Unverified	0
Harder or Different? Understanding Generalization of Audio Deepfake Detection	Jun 5, 2024	Audio Deepfake DetectionDeepFake Detection	—Unverified	0
Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Jun 5, 2024	Mixture-of-ExpertsSpeech Synthesis	—Unverified	0
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition	Jun 5, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing	Jun 4, 2024	DecoderLanguage Modeling	—Unverified	0
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models	Jun 4, 2024	In-Context LearningLanguage Modelling	CodeCode Available	7
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation	Jun 4, 2024	text-to-speechText to Speech	—Unverified	0
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis	Jun 4, 2024	In-Context LearningLanguage Modeling	—Unverified	0
ControlSpeech: Towards Simultaneous and Independent Zero-shot Speaker Cloning and Zero-shot Language Style Control	Jun 3, 2024	Speech Synthesistext-to-speech	CodeCode Available	3
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training	Jun 3, 2024	Speech Synthesistext-to-speech	—Unverified	0
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback	Jun 2, 2024	Speech Synthesistext-to-speech	—Unverified	0
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities	May 29, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation	May 28, 2024	Machine Translationspeech-recognition	CodeCode Available	2
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition	May 24, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified	0
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning	May 23, 2024	Speech Synthesistext-to-speech	—Unverified	0
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models	May 23, 2024	Image Generationreinforcement-learning	—Unverified	0
Multi-speaker Text-to-speech Training with Speaker Anonymized Data	May 20, 2024	Speaker anonymizationtext-to-speech	—Unverified	0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications	May 19, 2024	Language ModelingLanguage Modelling	—Unverified	0
Exploring speech style spaces with language models: Emotional TTS without emotion labels	May 18, 2024	text-to-speechText to Speech	—Unverified	0
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model	May 16, 2024	HallucinationLanguage Modeling	—Unverified	0
Building a Luganda Text-to-Speech Model From Crowdsourced Data	May 16, 2024	Speech Enhancementtext-to-speech	—Unverified	0

Show:10 25 50

← PrevPage 8 of 29Next →

No leaderboard results yet.