| KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset | Apr 17, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining | Jan 30, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Improving TTS for Shanghainese: Addressing Tone Sandhi via Word Segmentation | Jul 30, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder | Nov 7, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Improved Child Text-to-Speech Synthesis through Fastpitch-based Transfer Learning | Nov 7, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| HyperTTS: Parameter Efficient Adaptation in Text to Speech using Hypernetworks | Apr 6, 2024 | Domain AdaptationSpeech Synthesis | CodeCode Available | 1 |
| An End-to-end Chinese Text Normalization Model based on Rule-guided Flat-Lattice Transformer | Mar 31, 2022 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| HM-Conformer: A Conformer-based audio deepfake detection system with hierarchical pooling and multi-level classification token aggregation methods | Sep 15, 2023 | Audio Deepfake DetectionDeepFake Detection | CodeCode Available | 1 |
| IESTAC: English-Italian Parallel Corpus for End-to-End Speech-to-Text Machine Translation | Nov 1, 2020 | Dynamic Time WarpingMachine Translation | CodeCode Available | 1 |
| ShiftySpeech: A Large-Scale Synthetic Speech Dataset with Distribution Shifts | Feb 8, 2025 | BenchmarkingSelf-Supervised Learning | CodeCode Available | 1 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech | May 13, 2021 | DecoderSpeech Synthesis | CodeCode Available | 1 |
| An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization | May 26, 2023 | Audio GenerationInference Attack | CodeCode Available | 1 |
| FMFCC-A: A Challenging Mandarin Dataset for Synthetic Speech Detection | Oct 18, 2021 | Speech SynthesisSynthetic Speech Detection | CodeCode Available | 1 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 |
| g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset | Apr 7, 2020 | Grapheme-to-Phoneme ConversionPolyphone disambiguation | CodeCode Available | 1 |
| GUIRoboTron-Speech: Towards Automated GUI Agents Based on Speech Instructions | Jun 10, 2025 | text-to-speechText to Speech | CodeCode Available | 1 |
| AudioMarkBench: Benchmarking Robustness of Audio Watermarking | Jun 11, 2024 | Benchmarkingtext-to-speech | CodeCode Available | 1 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data | Apr 20, 2021 | Decodertext-to-speech | CodeCode Available | 1 |
| Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models | May 21, 2025 | Bayesian OptimizationSpeech Synthesis | CodeCode Available | 1 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 |
| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 |
| HiFi-WaveGAN: Generative Adversarial Network with Auxiliary Spectrogram-Phase Loss for High-Fidelity Singing Voice Generation | Oct 23, 2022 | Generative Adversarial NetworkSinging Voice Synthesis | CodeCode Available | 1 |
| AdaSpeech: Adaptive Text to Speech for Custom Voice | Mar 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| HUI-Audio-Corpus-German: A high quality TTS dataset | Jun 11, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| FastSpeech 2: Fast and High-Quality End-to-End Text to Speech | Jun 8, 2020 | Knowledge DistillationSpeech Synthesis | CodeCode Available | 1 |
| Imaginary Voice: Face-styled Diffusion Model for Text-to-Speech | Feb 27, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Automatic Prosody Annotation with Pre-Trained Text-Speech Model | Jun 16, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Improving fairness for spoken language understanding in atypical speech with Text-to-Speech | Nov 16, 2023 | Data AugmentationFairness | CodeCode Available | 1 |
| Accent Estimation of Japanese Words from Their Surfaces and Romanizations for Building Large Vocabulary Accent Dictionaries | Sep 21, 2020 | Sentencetext-to-speech | CodeCode Available | 1 |
| In Other News: A Bi-style Text-to-speech Model for Synthesizing Newscaster Voice with Limited Data | Apr 4, 2019 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| ÌròyìnSpeech: A multi-purpose Yorùbá Speech Corpus | Jul 29, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech | Jan 1, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis | May 12, 2020 | Speech SynthesisStyle Transfer | CodeCode Available | 1 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech | Nov 24, 2023 | Dimensionality ReductionEmotion Classification | CodeCode Available | 1 |
| Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding | Aug 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text | Apr 3, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation | May 18, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm | Dec 11, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 |
| Attention model for articulatory features detection | Jul 2, 2019 | Manner Of Articulation Detectionmodel | CodeCode Available | 1 |
| ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | May 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |