| Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages | May 2, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| STEMM: Self-learning with Speech-text Manifold Mixup for Speech Translation | Mar 20, 2022 | Machine TranslationSpeech-to-Text | CodeCode Available | 1 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 |
| Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement | Dec 21, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| X-Vector based voice activity detection for multi-genre broadcast speech-to-text | Dec 9, 2021 | Action DetectionActivity Detection | CodeCode Available | 1 |
| Cross Attention Augmented Transducer Networks for Simultaneous Translation | Nov 1, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| EdiTTS: Score-based Editing for Controllable Text-to-Speech | Oct 6, 2021 | Speech SynthesisSpeech-to-Text | CodeCode Available | 1 |
| Late reverberation suppression using U-nets | Oct 5, 2021 | DecoderSpeech Dereverberation | CodeCode Available | 1 |
| Speech Emotion Recognition with Multi-Task Learning | Sep 6, 2021 | Emotion ClassificationEmotion Recognition | CodeCode Available | 1 |
| One TTS Alignment To Rule Them All | Aug 23, 2021 | AllSpeech Synthesis | CodeCode Available | 1 |