| End to End Lip Synchronization with a Temporal AutoEncoder | Mar 30, 2022 | text-to-speechText to Speech | CodeCode Available | 1 | 5 |
| Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities | Dec 1, 2020 | Chinese Word SegmentationSpeech Synthesis | CodeCode Available | 1 | 5 |
| Where are we in audio deepfake detection? A systematic analysis over generative and detection models | Oct 6, 2024 | Audio Deepfake DetectionAudio Synthesis | CodeCode Available | 1 | 5 |
| QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning | Aug 31, 2023 | Representation LearningSpeech Representation Learning | CodeCode Available | 1 | 5 |
| SpeechLMScore: Evaluating speech generation using speech language model | Dec 8, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer | Jul 20, 2023 | Expressive Speech SynthesisLanguage Modelling | CodeCode Available | 1 | 5 |
| Diffusion-Based Mel-Spectrogram Enhancement for Personalized Speech Synthesis with Found Data | May 18, 2023 | Speech EnhancementSpeech Synthesis | CodeCode Available | 1 | 5 |
| SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model | Apr 2, 2021 | Decodertext-to-speech | CodeCode Available | 1 | 5 |
| Developing multilingual speech synthesis system for Ojibwe, Mi'kmaq, and Maliseet | Feb 4, 2025 | Speech Synthesistext-to-speech | CodeCode Available | 1 | 5 |
| Dict-TTS: Learning to Pronounce with Prior Dictionary Knowledge for Text-to-Speech | Jun 5, 2022 | Polyphone disambiguationtext-to-speech | CodeCode Available | 1 | 5 |
| Deep Learning Based Assessment of Synthetic Speech Naturalness | Apr 23, 2021 | Deep LearningPrediction | CodeCode Available | 1 | 5 |
| RWEN-TTS: Relation-aware Word Encoding Network for Natural Text-to-Speech Synthesis | Dec 15, 2022 | RelationSpeech Synthesis | CodeCode Available | 1 | 5 |
| Crowdsourced and Automatic Speech Prominence Estimation | Oct 12, 2023 | Emotion Recognitiontext-to-speech | CodeCode Available | 1 | 5 |
| ArTST: Arabic Text and Speech Transformer | Oct 25, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 | 5 |
| Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech | Oct 8, 2021 | Emotion InterpretationExpressive Speech Synthesis | CodeCode Available | 1 | 5 |
| Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech | May 9, 2022 | Diversitytext-to-speech | CodeCode Available | 1 | 5 |
| End-to-end Lyrics Alignment for Polyphonic Music Using an Audio-to-Character Recognition Model | Feb 18, 2019 | Retrievaltext-to-speech | CodeCode Available | 1 | 5 |
| RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis | Jun 15, 2021 | speech-recognitionSpeech Recognition | CodeCode Available | 1 | 5 |
| DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training | Jul 31, 2023 | DenoisingExpressive Speech Synthesis | CodeCode Available | 1 | 5 |
| Semi-Supervised Neural Architecture Search | Feb 24, 2020 | GPUNatural Language Transduction | CodeCode Available | 1 | 5 |
| Dreamento: an open-source dream engineering toolbox for sleep EEG wearables | Jul 8, 2022 | EEGElectroencephalogram (EEG) | CodeCode Available | 1 | 5 |
| QSpeech: Low-Qubit Quantum Speech Application Toolkit | May 26, 2022 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech | Jun 24, 2022 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| PromptTTS: Controllable Text-to-Speech with Text Descriptions | Nov 22, 2022 | DecoderSpeech Synthesis | CodeCode Available | 0 | 5 |
| Pretrained Speech Encoders and Efficient Fine-tuning Methods for Speech Translation: UPC at IWSLT 2022 | May 1, 2022 | DecoderKnowledge Distillation | CodeCode Available | 0 | 5 |
| Prosody Analysis of Audiobooks | Oct 10, 2023 | AttributeLanguage Modeling | CodeCode Available | 0 | 5 |
| PolyGlotFake: A Novel Multilingual and Multimodal DeepFake Dataset | May 14, 2024 | DeepFake DetectionFace Swapping | CodeCode Available | 0 | 5 |
| AraSpot: Arabic Spoken Command Spotting | Mar 29, 2023 | Data AugmentationKeyword Spotting | CodeCode Available | 0 | 5 |
| Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis | Apr 26, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| Predicting distributions with Linearizing Belief Networks | Nov 17, 2015 | DenoisingFacial expression generation | CodeCode Available | 0 | 5 |
| A Fully Time-domain Neural Model for Subband-based Speech Synthesizer | Oct 12, 2018 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Preparing an Endangered Language for the Digital Age: The Case of Judeo-Spanish | May 31, 2022 | Machine TranslationSpeech Synthesis | CodeCode Available | 0 | 5 |
| A Practical Guide to Logical Access Voice Presentation Attack Detection | Jan 10, 2022 | Artifact DetectionSpeaker Verification | CodeCode Available | 0 | 5 |
| On the Discrepancy between Density Estimation and Sequence Generation | Feb 17, 2020 | Density EstimationMachine Translation | CodeCode Available | 0 | 5 |
| Numbers Normalisation in the Inflected Languages: a Case Study of Polish | Aug 1, 2019 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Applying Phonological Features in Multilingual Text-To-Speech | Oct 7, 2021 | Language Acquisitiontext-to-speech | CodeCode Available | 0 | 5 |
| ObamaNet: Photo-realistic lip-sync from text | Dec 6, 2017 | Constrained Lip-synchronizationtext-to-speech | CodeCode Available | 0 | 5 |
| A Comparative Study on Transformer vs RNN in Speech Applications | Sep 13, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Non-Autoregressive Neural Text-to-Speech | May 21, 2019 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Naturalization of Text by the Insertion of Pauses and Filler Words | Nov 7, 2020 | Sentencetext-to-speech | CodeCode Available | 0 | 5 |
| Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech | Oct 18, 2024 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| Neural Voice Puppetry: Audio-driven Facial Reenactment | Dec 11, 2019 | Face ModelNeural Rendering | CodeCode Available | 0 | 5 |
| RNN Approaches to Text Normalization: A Challenge | Oct 31, 2016 | Text Normalizationtext-to-speech | CodeCode Available | 0 | 5 |
| MLS: A Large-Scale Multilingual Dataset for Speech Research | Dec 7, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers | Sep 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech | Dec 16, 2024 | text-to-speechText to Speech | CodeCode Available | 0 | 5 |
| Meta Learning Text-to-Speech Synthesis in over 7000 Languages | Jun 10, 2024 | Meta-LearningSpeech Synthesis | CodeCode Available | 0 | 5 |
| Massively Multilingual Neural Grapheme-to-Phoneme Conversion | Aug 4, 2017 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |
| MelNet: A Generative Model for Audio in the Frequency Domain | Jun 4, 2019 | Audio GenerationMusic Generation | CodeCode Available | 0 | 5 |
| MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the Bible | Jul 30, 2019 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 | 5 |