| Explicit Intensity Control for Accented Text-to-speech | Oct 27, 2022 | speech-recognitionSpeech Recognition | —Unverified | 0 |
| Virtuoso: Massive Multilingual Speech-Text Joint Semi-Supervised Learning for Text-To-Speech | Oct 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Improving Speech-to-Speech Translation Through Unlabeled Text | Oct 26, 2022 | Machine Translationspeech-recognition | —Unverified | 0 |
| Semi-Supervised Learning Based on Reference Model for Low-resource TTS | Oct 25, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data | Oct 25, 2022 | DecoderDisentanglement | —Unverified | 0 |
| Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS | Oct 24, 2022 | Data AugmentationGPU | —Unverified | 0 |
| Low-Resource Multilingual and Zero-Shot Multispeaker TTS | Oct 21, 2022 | Meta-Learningtext-to-speech | —Unverified | 0 |
| Adaptive re-calibration of channel-wise features for Adversarial Audio Classification | Oct 21, 2022 | Audio ClassificationFace Swapping | —Unverified | 0 |
| Generating Synthetic Speech from SpokenVocab for Speech Translation | Oct 15, 2022 | Data AugmentationMachine Translation | CodeCode Available | 0 |
| LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge | Oct 14, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy | Oct 13, 2022 | Generative Adversarial NetworkSpeaker anonymization | —Unverified | 0 |
| Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar | Oct 13, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| SQuId: Measuring Speech Naturalness in Many Languages | Oct 12, 2022 | Diversitytext-to-speech | —Unverified | 0 |
| Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech | Oct 12, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era | Oct 6, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Unsupervised Multi-scale Expressive Speaking Style Modeling with Hierarchical Context Information for Audiobook Speech Synthesis | Oct 1, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Facial Landmark Predictions with Applications to Metaverse | Sep 29, 2022 | Decodertext-to-speech | CodeCode Available | 0 |
| Multi-Task Adversarial Training Algorithm for Multi-Speaker Neural Text-to-Speech | Sep 26, 2022 | Generative Adversarial Networktext-to-speech | —Unverified | 0 |
| EPIC TTS Models: Empirical Pruning Investigations Characterizing Text-To-Speech Models | Sep 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Controllable Accented Text-to-Speech Synthesis | Sep 22, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Using Rater and System Metadata to Explain Variance in the VoiceMOS Challenge 2022 Dataset | Sep 14, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| SANIP: Shopping Assistant and Navigation for the visually impaired | Sep 8, 2022 | Objectobject-detection | —Unverified | 0 |
| Non-Standard Vietnamese Word Detection and Normalization for Text-to-Speech | Sep 7, 2022 | ArticlesSentence | —Unverified | 0 |
| Mlphon: A Multifunctional Grapheme-Phoneme Conversion Tool Using Finite State Transducers | Sep 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model | Sep 2, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Towards MOOCs for Lipreading: Using Synthetic Talking Heads to Train Humans in Lipreading at Scale | Aug 21, 2022 | LipreadingLip Reading | —Unverified | 0 |
| Speech Synthesis with Mixed Emotions | Aug 11, 2022 | AttributeEmotional Speech Synthesis | —Unverified | 0 |
| A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis | Aug 3, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation | Jul 29, 2022 | Data Augmentationtext-to-speech | —Unverified | 0 |
| Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis | Jul 25, 2022 | Data AugmentationSpeech Synthesis | —Unverified | 0 |
| When Is TTS Augmentation Through a Pivot Language Useful? | Jul 20, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| SATTS: Speaker Attractor Text to Speech, Learning to Speak by Learning to Separate | Jul 13, 2022 | Speech Separationtext-to-speech | —Unverified | 0 |
| A Cyclical Approach to Synthetic and Natural Speech Mismatch Refinement of Neural Post-filter for Low-cost Text-to-speech System | Jul 13, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS | Jul 13, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| End-to-end speech recognition modeling from de-identified data | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Huqariq: A Multilingual Speech Corpus of Native Languages of Peru for Speech Recognition | Jul 12, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| LIP: Lightweight Intelligent Preprocessor for meaningful text-to-speech | Jul 11, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech (TTS) | Jul 4, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model | Jul 4, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS) | Jul 4, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Computer-assisted Pronunciation Training -- Speech synthesis is almost all you need | Jul 2, 2022 | AllSpeech Synthesis | —Unverified | 0 |
| Empathic Machines: Using Intermediate Features as Levers to Emulate Emotions in Text-To-Speech Systems | Jul 1, 2022 | text-to-speechText to Speech | —Unverified | 0 |
| Fast Bilingual Grapheme-To-Phoneme Conversion | Jul 1, 2022 | Data AugmentationGrapheme-to-Phoneme Conversion | —Unverified | 0 |
| A Polyphone BERT for Polyphone Disambiguation in Mandarin Chinese | Jul 1, 2022 | Polyphone disambiguationtext-to-speech | —Unverified | 0 |
| Automatic Evaluation of Speaker Similarity | Jul 1, 2022 | Speaker Verificationtext-to-speech | —Unverified | 0 |
| TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder | Jun 30, 2022 | Speech Synthesistext-to-speech | —Unverified | 0 |
| R-MelNet: Reduced Mel-Spectral Modeling for Neural TTS | Jun 30, 2022 | DecoderGPU | —Unverified | 0 |
| Improving Deliberation by Text-Only and Semi-Supervised Training | Jun 29, 2022 | DecoderLanguage Modeling | —Unverified | 0 |
| Simple and Effective Multi-sentence TTS with Expressive and Coherent Prosody | Jun 29, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Comparison of Speech Representations for the MOS Prediction System | Jun 28, 2022 | Self-Supervised Learningtext-to-speech | —Unverified | 0 |