| ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech | Nov 7, 2022 | Representation LearningSpeech Representation Learning | CodeCode Available | 6 |
| W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training | Aug 7, 2021 | Contrastive LearningLanguage Modeling | CodeCode Available | 3 |
| Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | Jan 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Robust Self-Supervised Audio-Visual Speech Recognition | Jan 5, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning | Oct 17, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Mar 9, 2023 | 3D Face AnimationRepresentation Learning | CodeCode Available | 1 |
| DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization | Dec 11, 2020 | DiversityQuantization | CodeCode Available | 1 |
| An Unsupervised Autoregressive Model for Speech Representation Learning | Apr 5, 2019 | General Classificationmodel | CodeCode Available | 1 |
| data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup | Nov 2, 2022 | Automatic Speech Recognition (ASR)Language Modeling | CodeCode Available | 1 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 |