| Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models | May 21, 2025 | Bayesian OptimizationSpeech Synthesis | CodeCode Available | 1 |
| Multi-Task Learning for Front-End Text Processing in TTS | Jan 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search | May 22, 2020 | text-to-speechText to Speech | CodeCode Available | 1 |
| g2pM: A Neural Grapheme-to-Phoneme Conversion Package for Mandarin Chinese Based on a New Open Benchmark Dataset | Apr 7, 2020 | Grapheme-to-Phoneme ConversionPolyphone disambiguation | CodeCode Available | 1 |
| From Tens of Hours to Tens of Thousands: Scaling Back-Translation for Speech Recognition | May 22, 2025 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint | May 10, 2020 | Speaker VerificationSpeech Synthesis | CodeCode Available | 1 |
| Google Crowdsourced Speech Corpora and Related Open-Source Resources for Low-Resource Languages and Dialects: An Overview | Oct 14, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| BiSinger: Bilingual Singing Voice Synthesis | Sep 25, 2023 | Singing Voice Synthesistext-to-speech | CodeCode Available | 1 |
| Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling | Oct 8, 2020 | Speech Recognitiontext-to-speech | CodeCode Available | 1 |
| One-class learning towards generalized voice spoofing detection | Oct 27, 2020 | Speaker Verificationtext-to-speech | CodeCode Available | 1 |
| MathReader : Text-to-Speech for Mathematical Documents | Jan 13, 2025 | Optical Character Recognition (OCR)text-to-speech | CodeCode Available | 1 |
| FastSpeech 2: Fast and High-Quality End-to-End Text to Speech | Jun 8, 2020 | Knowledge DistillationSpeech Synthesis | CodeCode Available | 1 |
| Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding | Aug 12, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text | Apr 3, 2018 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| FastPitch: Parallel Text-to-speech with Pitch Prediction | Jun 11, 2020 | Predictiontext-to-speech | CodeCode Available | 1 |
| FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis | Oct 27, 2022 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech | Oct 1, 2023 | speech-recognitionSpeech Recognition | CodeCode Available | 1 |
| Attention model for articulatory features detection | Jul 2, 2019 | Manner Of Articulation Detectionmodel | CodeCode Available | 1 |
| Brilla AI: AI Contestant for the National Science and Maths Quiz | Mar 4, 2024 | MathQuestion Answering | CodeCode Available | 1 |
| Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems | Apr 15, 2021 | Text Normalizationtext-to-speech | CodeCode Available | 1 |
| ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation | May 29, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Evaluating Parameter-Efficient Transfer Learning Approaches on SURE Benchmark for Speech Understanding | Mar 2, 2023 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis | Jun 29, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| Fine-grained style control in Transformer-based Text-to-speech Synthesis | Oct 12, 2021 | Inductive BiasSpeech Synthesis | CodeCode Available | 1 |
| Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion | Aug 13, 2020 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 |
| End-to-End Adversarial Text-to-Speech | Jun 5, 2020 | Adversarial TextDynamic Time Warping | CodeCode Available | 1 |
| SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer | Jul 20, 2023 | Expressive Speech SynthesisLanguage Modelling | CodeCode Available | 1 |
| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 |
| Semi-Supervised Neural Architecture Search | Feb 24, 2020 | GPUNatural Language Transduction | CodeCode Available | 1 |
| EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech | Jun 28, 2023 | Emotion RecognitionSpeech Synthesis | CodeCode Available | 1 |
| EMNS /Imz/ Corpus: An emotive single-speaker dataset for narrative storytelling in games, television and graphic novels | May 22, 2023 | Expressive Speech SynthesisSpeech Synthesis | CodeCode Available | 1 |
| Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech | Sep 21, 2023 | text-to-speechText to Speech | CodeCode Available | 1 |
| ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet | Nov 29, 2021 | Spoken Language Understandingtext-to-speech | CodeCode Available | 1 |
| Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis | May 12, 2020 | Speech SynthesisStyle Transfer | CodeCode Available | 1 |
| Effective Deep Learning Models for Automatic Diacritization of Arabic Text | Nov 1, 2020 | Arabic Text DiacritizationDecoder | CodeCode Available | 1 |
| Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling | Jun 11, 2021 | Speech Synthesistext-to-speech | CodeCode Available | 1 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention | Oct 24, 2017 | text-to-speechText to Speech | CodeCode Available | 1 |
| A Survey on Neural Speech Synthesis | Jun 29, 2021 | Speech SynthesisSurvey | CodeCode Available | 1 |
| Can we use Common Voice to train a Multi-Speaker TTS system? | Oct 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| TacoLM: GaTed Attention Equipped Codec Language Model are Efficient Zero-Shot Text to Speech Synthesizers | Jun 22, 2024 | DecoderLanguage Modeling | CodeCode Available | 1 |
| E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS | Jun 26, 2024 | text-to-speechText to Speech | CodeCode Available | 1 |
| EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion | Jul 4, 2021 | text-to-speechText to Speech | CodeCode Available | 1 |
| Text + Sketch: Image Compression at Ultra Low Rates | Jul 4, 2023 | Image CompressionText to Speech | CodeCode Available | 1 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 |
| Dreamento: an open-source dream engineering toolbox for sleep EEG wearables | Jul 8, 2022 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 |
| DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training | Jul 31, 2023 | DenoisingExpressive Speech Synthesis | CodeCode Available | 1 |
| EfficientSpeech: An On-Device Text to Speech Model | May 23, 2023 | CPUmodel | CodeCode Available | 1 |
| Deep Learning Based Assessment of Synthetic Speech Naturalness | Apr 23, 2021 | Deep LearningPrediction | CodeCode Available | 1 |