| An Unsupervised Autoregressive Model for Speech Representation Learning | Apr 5, 2019 | General Classificationmodel | CodeCode Available | 1 |
| DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning | May 17, 2023 | ClusteringLanguage Modeling | CodeCode Available | 1 |
| FaceXHuBERT: Text-less Speech-driven E(X)pressive 3D Facial Animation Synthesis Using Self-Supervised Speech Representation Learning | Mar 9, 2023 | 3D Face AnimationRepresentation Learning | CodeCode Available | 1 |
| LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT | Mar 29, 2022 | AllAutomatic Speech Recognition | CodeCode Available | 1 |
| Robust Data2vec: Noise-robust Speech Representation Learning for ASR by Combining Regression and Improved Contrastive Learning | Oct 27, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 1 |
| Speech SIMCLR: Combining Contrastive and Reconstruction Objective for Self-supervised Speech Representation Learning | Oct 27, 2020 | Emotion RecognitionRepresentation Learning | CodeCode Available | 1 |
| DeCoAR 2.0: Deep Contextualized Acoustic Representations with Vector Quantization | Dec 11, 2020 | DiversityQuantization | CodeCode Available | 1 |
| MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets | Nov 14, 2022 | Automatic Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| Multi-Task Corrupted Prediction for Learning Robust Audio-Visual Speech Representation | Jan 23, 2025 | Audio-Visual Speech RecognitionMulti-Task Learning | CodeCode Available | 1 |
| Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using β-VAE | Oct 25, 2022 | DisentanglementRepresentation Learning | —Unverified | 0 |
| Disentangled Speech Representation Learning Based on Factorized Hierarchical Variational Autoencoder with Self-Supervised Objective | Apr 5, 2022 | DisentanglementRepresentation Learning | —Unverified | 0 |
| A Comparison of Discrete Latent Variable Models for Speech Representation Learning | Oct 24, 2020 | Phoneme RecognitionRepresentation Learning | —Unverified | 0 |
| Disentangled Feature Learning for Real-Time Neural Speech Coding | Nov 22, 2022 | DisentanglementRepresentation Learning | —Unverified | 0 |
| ATCSpeechNet: A multilingual end-to-end speech recognition framework for air traffic control systems | Feb 17, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A Brief Overview of Unsupervised Neural Speech Representation Learning | Mar 1, 2022 | Representation LearningSpeech Representation Learning | —Unverified | 0 |
| Deep Representation Learning in Speech Processing: Challenges, Recent Advances, and Future Trends | Jan 2, 2020 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Are Music Foundation Models Better at Singing Voice Deepfake Detection? Far-Better Fuse them with Speech Foundation Models | Sep 21, 2024 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Adversarially learning disentangled speech representations for robust multi-factor voice conversion | Jan 30, 2021 | Representation LearningRhythm | —Unverified | 0 |
| HYFuse: Aligning Heterogeneous Speech Pre-Trained Representations in Hyperbolic Space for Speech Emotion Recognition | Jun 3, 2025 | Emotion RecognitionRepresentation Learning | —Unverified | 0 |
| Experiments on Turkish ASR with Self-Supervised Speech Representation Learning | Oct 13, 2022 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Application of Knowledge Distillation to Multi-task Speech Representation Learning | Oct 29, 2022 | Keyword SpottingKnowledge Distillation | —Unverified | 0 |
| Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement | Nov 12, 2022 | Data AugmentationEmotion Recognition | —Unverified | 0 |
| Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation | Jun 17, 2019 | ClusteringRepresentation Learning | —Unverified | 0 |
| Input-independent Attention Weights Are Expressive Enough: A Study of Attention in Self-supervised Audio Transformers | Jun 9, 2020 | General ClassificationRepresentation Learning | —Unverified | 0 |
| General-Purpose Speech Representation Learning through a Self-Supervised Multi-Granularity Framework | Feb 3, 2021 | ClassificationEmotion Classification | —Unverified | 0 |