ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Jun 16, 2025 Decoder Speech Synthesis
Code Code Available 4S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation Jun 11, 2025 Reading Comprehension Speech Synthesis
— Unverified 0A Novel Data Augmentation Approach for Automatic Speaking Assessment on Opinion Expressions Jun 4, 2025 Data Augmentation Diversity
— Unverified 0CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Jun 3, 2025 Speech Synthesis text-to-speech
— Unverified 0SALF-MOS: Speaker Agnostic Latent Features Downsampled for MOS Prediction Jun 2, 2025 Speech Synthesis text-to-speech
— Unverified 0Chain-of-Thought Training for Open E2E Spoken Dialogue Systems May 31, 2025 Language Modeling Language Modelling
— Unverified 0Zero-Shot Streaming Text to Speech Synthesis with Transducer and Auto-Regressive Modeling May 26, 2025 Sentence Speech Synthesis
— Unverified 0Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis May 25, 2025 Speech Synthesis text-to-speech
— Unverified 0Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models May 21, 2025 Bayesian Optimization Speech Synthesis
Code Code Available 1FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for Ü-Tsang, Amdo and Kham Speech Dataset Generation May 20, 2025 Dataset Generation Speech Synthesis
— Unverified 0Shallow Flow Matching for Coarse-to-Fine Text-to-Speech Synthesis May 18, 2025 Speech Synthesis text-to-speech
— Unverified 0Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications May 12, 2025 Speech Synthesis text-to-speech
— Unverified 0A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models Apr 22, 2025 cross-modal alignment Script Generation
— Unverified 0AutoStyle-TTS: Retrieval-Augmented Generation based Automatic Style Matching Text-to-Speech Synthesis Apr 14, 2025 RAG Retrieval-augmented Generation
— Unverified 0Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis Apr 14, 2025 Language Modeling Language Modelling
— Unverified 0MoonCast: High-Quality Zero-Shot Podcast Generation Mar 18, 2025 Speech Synthesis text-to-speech
Code Code Available 3ASVspoof 5: Design, Collection and Validation of Resources for Spoofing, Deepfake, and Adversarial Attack Detection Using Crowdsourced Speech Feb 13, 2025 Adversarial Attack Adversarial Attack Detection
— Unverified 0PROEMO: Prompt-Driven Text-to-Speech Synthesis Based on Emotion and Intensity Control Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Low-Resource Text-to-Speech Synthesis Using Noise-Augmented Training of ForwardTacotron Jan 10, 2025 Speech Synthesis text-to-speech
— Unverified 0Probing Speaker-specific Features in Speaker Representations Jan 9, 2025 Self-Supervised Learning Speaker Verification
— Unverified 0Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting Dec 28, 2024 Speech Synthesis text-to-speech
— Unverified 0Incremental Disentanglement for Environment-Aware Zero-Shot Text-to-Speech Synthesis Dec 22, 2024 Decoder Disentanglement
— Unverified 0ProsodyFM: Unsupervised Phrasing and Intonation Control for Intelligible Speech Synthesis Dec 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Efficient Generative Modeling with Residual Vector Quantization-Based Tokens Dec 13, 2024 Conditional Image Generation Image Generation
— Unverified 0Multimodal Latent Language Modeling with Next-Token Diffusion Dec 11, 2024 Image Generation Language Modeling
Code Code Available 0Debatts: Zero-Shot Debating Text-to-Speech Synthesis Nov 10, 2024 Speech Synthesis text-to-speech
— Unverified 0Lina-Speech: Gated Linear Attention is a Fast and Parameter-Efficient Learner for text-to-speech synthesis Oct 30, 2024 Speech Synthesis text-to-speech
Code Code Available 2A Unified Framework for Collecting Text-to-Speech Synthesis Datasets for 22 Indian Languages Oct 18, 2024 Speech Synthesis text-to-speech
— Unverified 0DurIAN-E 2: Duration Informed Attention Network with Adaptive Variational Autoencoder and Adversarial Learning for Expressive Text-to-Speech Synthesis Oct 17, 2024 Speech Synthesis text-to-speech
— Unverified 0Efficient training strategies for natural sounding speech synthesis and speaker adaptation based on FastPitch Oct 9, 2024 Speech Synthesis text-to-speech
— Unverified 0Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS Oct 9, 2024 Diversity Speech Synthesis
— Unverified 0HALL-E: Hierarchical Neural Codec Language Model for Minute-Long Zero-Shot Text-to-Speech Synthesis Oct 6, 2024 Language Modeling Language Modelling
— Unverified 0Generative Semantic Communication for Text-to-Speech Synthesis Oct 4, 2024 Quantization Semantic Communication
— Unverified 0Accent conversion using discrete units with parallel data synthesized from controllable accented TTS Sep 30, 2024 Data Augmentation Speech Synthesis
— Unverified 0StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Sep 24, 2024 Speech Synthesis text-to-speech
— Unverified 0StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion Sep 16, 2024 Speech Synthesis text-to-speech
— Unverified 0Text-To-Speech Synthesis In The Wild Sep 13, 2024 Benchmarking Speaker Recognition
— Unverified 0Full-text Error Correction for Chinese Speech Recognition with Large Language Model Sep 12, 2024 Automatic Speech Recognition Automatic Speech Recognition (ASR)
— Unverified 0SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis Sep 11, 2024 Decoder Speech Synthesis
Code Code Available 2What happens to diffusion model likelihood when your model is conditional? Sep 10, 2024 domain classification model
— Unverified 0AS-Speech: Adaptive Style For Speech Synthesis Sep 9, 2024 Rhythm Speech Synthesis
— Unverified 0Sample-Efficient Diffusion for Text-To-Speech Synthesis Sep 1, 2024 Language Modeling Language Modelling
Code Code Available 2Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks Jul 26, 2024 Generative Adversarial Network Speech Enhancement
— Unverified 0Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models Jul 18, 2024 Language Modeling Language Modelling
— Unverified 0Autoregressive Speech Synthesis without Vector Quantization Jul 11, 2024 Audio Compression Diversity
— Unverified 0Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis Jul 4, 2024 Accented Speech Recognition Automatic Speech Recognition
— Unverified 0Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization Jul 2, 2024 Inference Optimization Speech Synthesis
— Unverified 0FLY-TTS: Fast, Lightweight and High-Quality End-to-End Text-to-Speech Synthesis Jun 30, 2024 CPU Decoder
— Unverified 0Multi-Scale Accent Modeling and Disentangling for Multi-Speaker Multi-Accent Text-to-Speech Synthesis Jun 16, 2024 Disentanglement Speech Synthesis
— Unverified 0VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment Jun 12, 2024 Quantization Speech Synthesis
— Unverified 0