SOTAVerified

Text to Speech

import gTTS import os def text_to_speech_kurdish(text, output_file="output.mp3"): # گۆڕینی نووسین بۆ دەنگ بە زمانی کوردی (هەڵبژاردنی زمانی "ku" بۆ کوردی) tts = gTTS(text=text, lang='ku', slow=False) tts.save(output_file) os.system(f"start {output_file}") # کردنەوەی فایلە دەنگییەکە (لە Windows) # نموونە: text_to_speech_kurdish("سڵاو، ئەمە دەنگی منە بە زمانی کوردی.")

Papers

Showing 651700 of 1419 papers

TitleStatusHype
Accented Text-to-Speech Synthesis with Limited Data0
Data Center Audio/Video Intelligence on Device (DAVID) -- An Edge-AI Platform for Smart-Toys0
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control0
Data Augmentation Methods for End-to-end Speech Recognition on Distant-Talk Scenarios0
DASB -- Discrete Audio and Speech Benchmark0
IMaSC -- ICFOSS Malayalam Speech Corpus0
DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech0
Analysis and Utilization of Entrainment on Acoustic and Emotion Features in User-agent Dialogue0
HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models0
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru forSpeech Recognition0
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition0
Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition0
Human-in-the-loop Speaker Adaptation for DNN-based Multi-speaker TTS0
Cycle-consistency training for end-to-end speech recognition0
Impact of Frame Rates on Speech Tokenizer: A Case Study on Mandarin and English0
Improve Cross-lingual Voice Cloning Using Low-quality Code-switched Data0
Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video0
Customizing Grapheme-to-Phoneme System for Non-Trivial Transcription Problems in Bangla Language0
Improve few-shot voice cloning using multi-modal learning0
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech0
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models0
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model0
Improving Code-Switching and Named Entity Recognition in ASR with Speech Editing based Data Augmentation0
Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model0
An Algorithm Based on Empirical Methods, for Text-to-Tuneful-Speech Synthesis of Sanskrit Verse0
Improving Deliberation by Text-Only and Semi-Supervised Training0
HMM-based data augmentation for E2E systems for building conversational speech synthesis systems0
Improving Grapheme-to-Phoneme Conversion through In-Context Knowledge Retrieval with Large Language Models0
CUIfy the XR: An Open-Source Package to Embed LLM-powered Conversational Agents in XR0
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS0
Improving LPCNet-based Text-to-Speech with Linear Prediction-structured Mixture Density Network0
Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information0
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows0
Improving Noise Robustness of LLM-based Zero-shot TTS via Discrete Acoustic Token Denoising0
Improving Performance of End-to-End ASR on Numeric Sequences0
Improving prosodic phrasing of Vietnamese text-to-speech systems0
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis0
Improving Readability for Automatic Speech Recognition Transcription0
HLTCOE JHU Submission to the Voice Privacy Challenge 20240
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment0
Improving Speech-to-Speech Translation Through Unlabeled Text0
Improving the expressiveness of neural vocoding with non-affine Normalizing Flows0
Improving the quality of neural TTS using long-form content and multi-speaker multi-style modeling0
Cued Speech Generation Leveraging a Pre-trained Audiovisual Text-to-Speech Model0
Incorporating speaker embedding and post-filter network for improving speaker similarity of personalized speech synthesis system0
Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis0
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI0
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time0
High Quality Streaming Speech Synthesis with Low, Sentence-Length-Independent Latency0
High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units0
Show:102550
← PrevPage 14 of 29Next →

No leaderboard results yet.