SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 601650 of 1419 papers

TitleStatusHype
Building a Luganda Text-to-Speech Model From Crowdsourced Data0
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text0
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer0
PolyGlotFake: A Novel Multilingual and Multimodal DeepFake DatasetCode0
Real-Time Pill Identification for the Visually Impaired Using Deep Learning0
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech0
TI-ASU: Toward Robust Automatic Speech Understanding through Text-to-speech Imputation Against Missing Speech Modality0
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations0
Retrieval-Augmented Audio Deepfake Detection0
Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling0
Voice-Assisted Real-Time Traffic Sign Recognition System Using Convolutional Neural Network0
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge0
Cross-Domain Audio Deepfake Detection: Dataset and Analysis0
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis0
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech0
PSCodec: A Series of High-Fidelity Low-bitrate Neural Speech Codecs Leveraging Prompt Encoders0
Humane Speech Synthesis through Zero-Shot Emotion and Disfluency GenerationCode0
A Review of Multi-Modal Large Language and Vision Models0
Isometric Neural Machine Translation using Phoneme Count Ratio Reward-based Reinforcement Learning0
Creating an African American-Sounding TTS: Guidelines, Technical Challenges,and Surprising Evaluations0
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech0
Attempt Towards Stress Transfer in Speech-to-Speech Machine Translation0
AttentionStitch: How Attention Solves the Speech Editing Problem0
Towards Accurate Lip-to-Speech Synthesis in-the-Wild0
Extending Multilingual Speech Synthesis to 100+ Languages without Transcribed Data0
Efficient data selection employing Semantic Similarity-based Graph Structures for model training0
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition0
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models0
Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic ForgettingCode0
Ain't Misbehavin' -- Using LLMs to Generate Expressive Robot Behavior in Conversations with the Tabletop Robot Haru0
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech0
Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like0
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data0
A New Approach to Voice Authenticity0
Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations0
Frame-Wise Breath Detection with Self-Training: An Exploration of Enhancing Breath Naturalness in Text-to-Speech0
MunTTS: A Text-to-Speech System for Mundari0
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech0
Maximizing Data Efficiency for Cross-Lingual TTS Adaptation by Self-Supervised Representation Mixing and Embedding Initialization0
Adversarial speech for voice privacy protection from Personalized Speech generation0
Empowering Communication: Speech Technology for Indian and Western Accents through AI-powered Speech Synthesis0
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech0
MCMChaos: Improvising Rap Music with MCMC Methods and Chaos Theory0
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering0
End to end Hindi to English speech conversion using Bark, mBART and a finetuned XLSR Wav2Vec20
Noise-robust zero-shot text-to-speech synthesis conditioned on self-supervised speech-representation model with adapters0
Evaluating and Personalizing User-Perceived Quality of Text-to-Speech Voices for Delivering Mindfulness Meditation with Different Physical Embodiments0
Transfer the linguistic representations from TTS to accent conversion with non-parallel data0
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction0
Incremental FastPitch: Chunk-based High Quality Text to Speech0
Show:102550
← PrevPage 13 of 29Next →

No leaderboard results yet.