Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 551–600 of 1419 papers

Title	Date	Tasks	Status
NAIST Simultaneous Speech Translation System for IWSLT 2024	Jun 30, 2024	Speech-to-Speech TranslationSpeech-to-Text	—Unverified
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis	Jun 30, 2024	CPUDecoder	—Unverified
Open-Source Conversational AI with SpeechBrain 1.0	Jun 29, 2024	Language ModelingLanguage Modelling	—Unverified
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models	Jun 27, 2024	Speaker Verificationtext-to-speech	—Unverified
Automatic Speech Recognition for Hindi	Jun 26, 2024	Action DetectionActivity Detection	—Unverified
LLM-Driven Multimodal Opinion Expression Identification	Jun 26, 2024	text-to-speechText to Speech	—Unverified
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model	Jun 25, 2024	Computational EfficiencyLanguage Modeling	—Unverified
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment	Jun 25, 2024	DecoderLanguage Modeling	—Unverified
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation	Jun 25, 2024	Speech Synthesistext-to-speech	—Unverified
Towards Zero-Shot Text-To-Speech for Arabic Dialects	Jun 24, 2024	Dialect IdentificationSpeech Synthesis	—Unverified
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge	Jun 22, 2024	Speech Synthesistext-to-speech	—Unverified
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions	Jun 21, 2024	speech-recognitionSpeech Recognition	—Unverified
DASB -- Discrete Audio and Speech Benchmark	Jun 20, 2024	BenchmarkingEmotion Recognition	—Unverified
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models	Jun 18, 2024	Synthetic Data Generationtext-to-speech	—Unverified
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis	Jun 16, 2024	DisentanglementSpeech Synthesis	—Unverified
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice	Jun 14, 2024	text-to-speechText to Speech	—Unverified
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing	Jun 13, 2024	Language ModelingLanguage Modelling	—Unverified
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage	Jun 13, 2024	Sentencetext-to-speech	—Unverified
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment	Jun 12, 2024	QuantizationSpeech Synthesis	—Unverified
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech	Jun 12, 2024	text-to-speechText to Speech	—Unverified
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data	Jun 12, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?	Jun 11, 2024	Contrastive LearningSpeech Synthesis	—Unverified
Meta Learning Text-to-Speech Synthesis in over 7000 Languages	Jun 10, 2024	Meta-LearningSpeech Synthesis	—Unverified
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance	Jun 10, 2024	Singing Voice Synthesistext-to-speech	—Unverified
Controlling Emotion in Text-to-Speech with Natural Language Prompts	Jun 10, 2024	text-to-speechText to Speech	—Unverified
Text-aware and Context-aware Expressive Audiobook Speech Synthesis	Jun 9, 2024	Contrastive LearningLanguage Modeling	—Unverified
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS	Jun 9, 2024	DenoisingSpeech Denoising	—Unverified
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers	Jun 8, 2024	Speech Synthesistext-to-speech	—Unverified
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis	Jun 8, 2024	Audio GenerationDecoder	—Unverified
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study	Jun 7, 2024	DiversityLanguage Modeling	—Unverified
Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs	Jun 7, 2024	QuantizationSpeech Synthesis	—Unverified
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer	Jun 6, 2024	text-to-speechText to Speech	—Unverified
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model	Jun 6, 2024	Language ModelingLanguage Modelling	—Unverified
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems	Jun 6, 2024	Diversitytext-to-speech	—Unverified
Harder or Different? Understanding Generalization of Audio Deepfake Detection	Jun 5, 2024	Audio Deepfake DetectionDeepFake Detection	—Unverified
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition	Jun 5, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Style Mixture of Experts for Expressive Text-To-Speech Synthesis	Jun 5, 2024	Mixture-of-ExpertsSpeech Synthesis	—Unverified
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis	Jun 4, 2024	In-Context LearningLanguage Modeling	—Unverified
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation	Jun 4, 2024	text-to-speechText to Speech	—Unverified
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing	Jun 4, 2024	DecoderLanguage Modeling	—Unverified
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training	Jun 3, 2024	Speech Synthesistext-to-speech	—Unverified
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback	Jun 2, 2024	Speech Synthesistext-to-speech	—Unverified
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities	May 29, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition	May 24, 2024	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models	May 23, 2024	Image Generationreinforcement-learning	—Unverified
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning	May 23, 2024	Speech Synthesistext-to-speech	—Unverified
Multi-speaker Text-to-speech Training with Speaker Anonymized Data	May 20, 2024	Speaker anonymizationtext-to-speech	—Unverified
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications	May 19, 2024	Language ModelingLanguage Modelling	—Unverified
Exploring speech style spaces with language models: Emotional TTS without emotion labels	May 18, 2024	text-to-speechText to Speech	—Unverified
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model	May 16, 2024	HallucinationLanguage Modeling	—Unverified

Show:10 25 50

← PrevPage 12 of 29Next →

No leaderboard results yet.