| Full-text Error Correction for Chinese Speech Recognition with Large Language Model | Sep 12, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Zero-Shot Text-to-Speech as Golden Speech Generator: A Systematic Framework and its Applicability in Automatic Pronunciation Assessment | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| D-CAPTCHA++: A Study of Resilience of Deepfake CAPTCHA under Transferable Imperceptible Adversarial Attack | Sep 11, 2024 | Adversarial AttackAudio Synthesis | —Unverified | 0 |
| Cross-Dialect Text-To-Speech in Pitch-Accent Language Incorporating Multi-Dialect Phoneme-Level BERT | Sep 11, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| VoiceWukong: Benchmarking Deepfake Voice Detection | Sep 10, 2024 | BenchmarkingFace Swapping | —Unverified | 0 |
| Enhancing Kurdish Text-to-Speech with Native Corpus Training: A High-Quality WaveGlow Vocoder Approach | Sep 10, 2024 | Speech Synthesistext-to-speech | —Unverified | 0 |
| What happens to diffusion model likelihood when your model is conditional? | Sep 10, 2024 | domain classificationmodel | —Unverified | 0 |
| AS-Speech: Adaptive Style For Speech Synthesis | Sep 9, 2024 | RhythmSpeech Synthesis | —Unverified | 0 |
| LAST: Language Model Aware Speech Tokenization | Sep 5, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Training Universal Vocoders with Feature Smoothing-Based Augmentation Methods for High-Quality TTS Systems | Sep 4, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| VoxHakka: A Dialectally Diverse Multi-speaker Text-to-Speech System for Taiwanese Hakka | Sep 3, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Framework for Synthetic Audio Conversations Generation using Large Language Models | Sep 2, 2024 | Audio ClassificationAudio Tagging | —Unverified | 0 |
| A multilingual training strategy for low resource Text to Speech | Sep 2, 2024 | Cross-Lingual Transfertext-to-speech | —Unverified | 0 |
| AASIST3: KAN-Enhanced AASIST Speech Deepfake Detection using SSL Features and Additional Regularization for the ASVspoof 2024 Challenge | Aug 30, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection | Aug 30, 2024 | Self-Supervised LearningSpeech Synthesis | —Unverified | 0 |
| Multi-modal Adversarial Training for Zero-Shot Voice Cloning | Aug 28, 2024 | Decodertext-to-speech | —Unverified | 0 |
| Easy, Interpretable, Effective: openSMILE for voice deepfake detection | Aug 28, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech | Aug 27, 2024 | parameter-efficient fine-tuningtext-to-speech | CodeCode Available | 0 |
| DualSpeech: Enhancing Speaker-Fidelity and Text-Intelligibility Through Dual Classifier-Free Guidance | Aug 26, 2024 | Diversitytext-to-speech | —Unverified | 0 |
| SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models | Aug 25, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Positional Description for Numerical Normalization | Aug 22, 2024 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting | Aug 20, 2024 | Keyword Spottingtext-to-speech | —Unverified | 0 |
| kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech | Aug 20, 2024 | RetrievalSelf-Supervised Learning | —Unverified | 0 |
| Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition | Aug 17, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation | Aug 13, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SaSLaW: Dialogue Speech Corpus with Audio-visual Egocentric Information Toward Environment-adaptive Dialogue Speech Synthesis | Aug 13, 2024 | Speech SynthesisSpoken Dialogue Systems | CodeCode Available | 0 |
| FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks | Aug 12, 2024 | Few-Shot Learningtext-to-speech | —Unverified | 0 |
| VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing | Aug 11, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation | Aug 1, 2024 | Representation LearningSpeech Synthesis | —Unverified | 0 |
| On the Problem of Text-To-Speech Model Selection for Synthetic Data Generation in Automatic Speech Recognition | Jul 31, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Speech Bandwidth Expansion Via High Fidelity Generative Adversarial Networks | Jul 26, 2024 | Generative Adversarial NetworkSpeech Enhancement | —Unverified | 0 |
| On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures | Jul 25, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Zero-Shot vs. Few-Shot Multi-Speaker TTS Using Pre-trained Czech SpeechT5 Model | Jul 24, 2024 | text-to-speechText to Speech | —Unverified | 0 |
| Synth4Kws: Synthesized Speech for User Defined Keyword Spotting in Low Resource Environments | Jul 23, 2024 | DiversityKeyword Spotting | —Unverified | 0 |
| Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2 | Jul 19, 2024 | Audio GenerationAudio Synthesis | —Unverified | 0 |
| Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models | Jul 18, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Handling Numeric Expressions in Automatic Speech Recognition | Jul 18, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network | Jul 17, 2024 | text-to-speechText to Speech | CodeCode Available | 0 |
| A Language Modeling Approach to Diacritic-Free Hebrew TTS | Jul 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Learning High-Frequency Functions Made Easy with Sinusoidal Positional Encoding | Jul 12, 2024 | regressiontext-to-speech | CodeCode Available | 0 |
| Autoregressive Speech Synthesis without Vector Quantization | Jul 11, 2024 | Audio CompressionDiversity | —Unverified | 0 |
| Source Tracing of Audio Deepfake Systems | Jul 10, 2024 | Face Swappingtext-to-speech | —Unverified | 0 |
| Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation | Jul 7, 2024 | Text to Speech | CodeCode Available | 0 |
| ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation | Jul 7, 2024 | Sentencetext-to-speech | —Unverified | 0 |
| On the Effectiveness of Acoustic BPE in Decoder-Only TTS | Jul 4, 2024 | DecoderDiversity | —Unverified | 0 |
| Improving Accented Speech Recognition using Data Augmentation based on Unsupervised Text-to-Speech Synthesis | Jul 4, 2024 | Accented Speech RecognitionAutomatic Speech Recognition | —Unverified | 0 |
| Optimizing a-DCF for Spoofing-Robust Speaker Verification | Jul 4, 2024 | Speaker VerificationText to Speech | —Unverified | 0 |
| Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization | Jul 2, 2024 | Inference OptimizationSpeech Synthesis | —Unverified | 0 |
| TTSlow: Slow Down Text-to-Speech with Efficiency Robustness Evaluations | Jul 2, 2024 | Benchmarkingtext-to-speech | —Unverified | 0 |
| Lightweight Zero-shot Text-to-Speech with Mixture of Adapters | Jul 1, 2024 | DecoderSpeech Synthesis | —Unverified | 0 |