| ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech | Nov 7, 2022 | Representation LearningSpeech Representation Learning | CodeCode Available | 6 |
| W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training | Aug 7, 2021 | Contrastive LearningLanguage Modeling | CodeCode Available | 3 |
| Robust Self-Supervised Audio-Visual Speech Recognition | Jan 5, 2022 | Audio-Visual Speech RecognitionAutomatic Speech Recognition | CodeCode Available | 2 |
| Learning Audio-Visual Speech Representation by Masked Multimodal Cluster Prediction | Jan 5, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 2 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| EH-MAM: Easy-to-Hard Masked Acoustic Modeling for Self-Supervised Speech Representation Learning | Oct 17, 2024 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| Self-Supervised Syllable Discovery Based on Speaker-Disentangled HuBERT | Sep 16, 2024 | Acoustic Unit DiscoveryClustering | CodeCode Available | 1 |
| The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning | Feb 21, 2024 | BenchmarkingRepresentation Learning | CodeCode Available | 1 |
| CLARA: Multilingual Contrastive Learning for Audio Representation Acquisition | Oct 18, 2023 | Audio ClassificationContrastive Learning | CodeCode Available | 1 |
| Fast-HuBERT: An Efficient Training Framework for Self-Supervised Speech Representation Learning | Sep 25, 2023 | Representation LearningSelf-Supervised Learning | CodeCode Available | 1 |
| QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning | Aug 31, 2023 | Representation LearningSpeech Representation Learning | CodeCode Available | 1 |
| DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning | May 17, 2023 | ClusteringLanguage Modeling | CodeCode Available | 1 |
| FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Mar 9, 2023 | 3D Face AnimationRepresentation Learning | CodeCode Available | 1 |
| Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding | Feb 27, 2023 | Model CompressionRepresentation Learning | CodeCode Available | 1 |
| MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | Nov 14, 2022 | Automatic Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| SLICER: Learning universal audio representations using low-resource self-supervised pre-training | Nov 2, 2022 | Audio ClassificationClustering | CodeCode Available | 1 |
| data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup | Nov 2, 2022 | Automatic Speech Recognition (ASR)Language Modeling | CodeCode Available | 1 |
| Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning | Oct 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| The Efficacy of Self-Supervised Speech Models for Audio Representations | Sep 26, 2022 | Onset DetectionPitch Classification | CodeCode Available | 1 |
| TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation | May 25, 2022 | Representation LearningRhythm | CodeCode Available | 1 |
| Robust Disentangled Variational Speech Representation Learning for Zero-shot Voice Conversion | Mar 30, 2022 | Data AugmentationDecoder | CodeCode Available | 1 |
| LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Mar 29, 2022 | AllAutomatic Speech Recognition | CodeCode Available | 1 |
| A^3T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing | Mar 18, 2022 | Representation LearningSpeaker Verification | CodeCode Available | 1 |
| XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale | Nov 17, 2021 | Language IdentificationRepresentation Learning | CodeCode Available | 1 |
| UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training | Oct 12, 2021 | Data AugmentationMulti-Task Learning | CodeCode Available | 1 |
| HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units | Jun 14, 2021 | ClusteringLanguage Modelling | CodeCode Available | 1 |
| Supervised Speech Representation Learning for Parkinson's Disease Classification | Jun 1, 2021 | ClassificationRepresentation Learning | CodeCode Available | 1 |
| Using Radio Archives for Low-Resource Speech Recognition: Towards an Intelligent Virtual Assistant for Illiterate Users | Apr 27, 2021 | Language IdentificationRepresentation Learning | CodeCode Available | 1 |
| Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning | Mar 16, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data | Jan 19, 2021 | Multi-Task LearningRepresentation Learning | CodeCode Available | 1 |
| DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization | Dec 11, 2020 | DiversityQuantization | CodeCode Available | 1 |
| Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning | Oct 27, 2020 | Emotion RecognitionRepresentation Learning | CodeCode Available | 1 |
| An Unsupervised Autoregressive Model for Speech Representation Learning | Apr 5, 2019 | General Classificationmodel | CodeCode Available | 1 |
| Unsupervised speech representation learning using WaveNet autoencoders | Jan 25, 2019 | Acoustic Unit DiscoveryDecoder | CodeCode Available | 1 |
| HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition | Jun 3, 2025 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| DuRep: Dual-Mode Speech Representation Learning via ASR-Aware Distillation | May 26, 2025 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Universal Semantic Disentangled Privacy-preserving Speech Representation Learning | May 19, 2025 | DecoderPrivacy Preserving | —Unverified | 0 |
| UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation | Mar 2, 2025 | DecoderRepresentation Learning | —Unverified | 0 |
| k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning | Nov 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 0 |
| JOOCI: a Framework for Learning Comprehensive Speech Representations | Oct 14, 2024 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Progressive Residual Extraction based Pre-training for Speech Representation Learning | Aug 31, 2024 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| Speech Representation Learning Revisited: The Necessity of Separate Learnable Parameters and Robust Data Augmentation | Aug 20, 2024 | Data AugmentationRepresentation Learning | —Unverified | 0 |
| Towards the Next Frontier in Speech Representation Learning Using Disentanglement | Jul 2, 2024 | DisentanglementRepresentation Learning | —Unverified | 0 |
| Towards Robust Speech Representation Learning for Thousands of Languages | Jun 30, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| mHuBERT-147: A Compact Multilingual HuBERT Model | Jun 10, 2024 | Automatic Speech Recognition (ASR)Diversity | CodeCode Available | 0 |
| Emotion-Aware Speech Self-Supervised Representation Learning with Intensity Knowledge | Jun 10, 2024 | Representation LearningSelf-Supervised Learning | —Unverified | 0 |
| XLAVS-R: Cross-Lingual Audio-Visual Speech Representation Learning for Noise-Robust Speech Perception | Mar 21, 2024 | Audio-Visual Speech RecognitionRepresentation Learning | —Unverified | 0 |
| An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning | Mar 13, 2024 | DenoisingKnowledge Distillation | CodeCode Available | 0 |
| UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization | Jan 26, 2024 | DecoderDomain Adaptation | —Unverified | 0 |