SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 551600 of 1419 papers

TitleStatusHype
NAIST Simultaneous Speech Translation System for IWSLT 20240
FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis0
Open-Source Conversational AI with SpeechBrain 1.00
Application of ASV for Voice Identification after VC and Duration Predictor Improvement in TTS Models0
Automatic Speech Recognition for Hindi0
LLM-Driven Multimodal Opinion Expression Identification0
High Fidelity Text-to-Speech Via Discrete Tokens Using Token Transducer and Group Masked Language Model0
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment0
Leveraging Parameter-Efficient Transfer Learning for Multi-Lingual Text-to-Speech Adaptation0
Towards Zero-Shot Text-To-Speech for Arabic Dialects0
A multi-speaker multi-lingual voice cloning system based on vits2 for limmits 2024 challenge0
InterBiasing: Boost Unseen Word Recognition through Biasing Intermediate Predictions0
DASB -- Discrete Audio and Speech Benchmark0
Instruction Data Generation and Unsupervised Adaptation for Speech Language Models0
Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis0
Phoneme Discretized Saliency Maps for Explainable Detection of AI-Generated Voice0
DubWise: Video-Guided Speech Duration Control in Multimodal LLM-based Text-to-Speech for Dubbing0
DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage0
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment0
VECL-TTS: Voice identity and Emotional style controllable Cross-Lingual Text-to-Speech0
Audio-conditioned phonemic and prosodic annotation for building text-to-speech models from unlabeled speech data0
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?0
Meta Learning Text-to-Speech Synthesis in over 7000 Languages0
MakeSinger: A Semi-Supervised Training Method for Data-Efficient Singing Voice Synthesis via Classifier-free Diffusion Guidance0
Controlling Emotion in Text-to-Speech with Natural Language Prompts0
Text-aware and Context-aware Expressive Audiobook Speech Synthesis0
An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS0
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers0
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis0
Boosting Diffusion Model for Spectrogram Up-sampling in Text-to-speech: An Empirical Study0
Spectral Codecs: Improving Non-Autoregressive Speech Synthesis with Spectrogram-Based Audio Codecs0
A Human-in-the-Loop Approach to Improving Cross-Text Prosody Transfer0
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model0
Total-Duration-Aware Duration Modeling for Text-to-Speech Systems0
Harder or Different? Understanding Generalization of Audio Deepfake Detection0
Task Arithmetic can Mitigate Synthetic-to-Real Gap in Automatic Speech Recognition0
Style Mixture of Experts for Expressive Text-To-Speech Synthesis0
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis0
BiVocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction and Waveform Generation0
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing0
Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training0
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback0
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities0
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition0
DLPO: Diffusion Model Loss-Guided Reinforcement Learning for Fine-Tuning Text-to-Speech Diffusion Models0
Multilingual Prosody Transfer: Comparing Supervised & Transfer Learning0
Multi-speaker Text-to-speech Training with Speaker Anonymized Data0
VR-GPT: Visual Language Model for Intelligent Virtual Reality Applications0
Exploring speech style spaces with language models: Emotional TTS without emotion labels0
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model0
Show:102550
← PrevPage 12 of 29Next →

No leaderboard results yet.