Self-Supervised Learning

Self-Supervised Learning is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control.

Source: Self-supervised Point Set Local Descriptors for Point Cloud Registration

Image source: LeCun

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 5044 papers

Title	Date	Tasks	Status	Hype
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer	Sep 1, 2024	Self-Supervised Learningtext-to-speech	CodeCode Available	9
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training	Feb 5, 2025	Self-Supervised LearningSpeech Enhancement	CodeCode Available	9
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis	May 14, 2025	DenoisingDepth Estimation	CodeCode Available	7
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning	Jun 11, 2025	Action AnticipationLarge Language Model	CodeCode Available	7
What's Behind the Mask: Understanding Masked Graph Modeling for Graph Autoencoders	May 20, 2022	Contrastive LearningLink Prediction	CodeCode Available	6
AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding	May 6, 2024	Metric LearningSelf-Supervised Learning	CodeCode Available	5
Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training	May 23, 2023	Contrastive LearningSelf-Supervised Learning	CodeCode Available	5
Transformers without Normalization	Mar 13, 2025	Self-Supervised Learning	CodeCode Available	5
Learning to (Learn at Test Time): RNNs with Expressive Hidden States	Jul 5, 2024	16k8k	CodeCode Available	5
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think	Oct 9, 2024	DenoisingImage Generation	CodeCode Available	5
Awesome Multi-modal Object Tracking	May 23, 2024	Autonomous DrivingKnowledge Distillation	CodeCode Available	5
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch	Oct 27, 2023	Self-Supervised LearningSpeech Enhancement	CodeCode Available	4
Sonata: Self-Supervised Learning of Reliable Point Representations	Mar 20, 2025	3D Semantic SegmentationSelf-Supervised Learning	CodeCode Available	4
SSL4EO-L: Datasets and Foundation Models for Landsat Imagery	Jun 15, 2023	Cloud DetectionEarth Observation	CodeCode Available	4
A Framework For Contrastive Self-Supervised Learning And Designing A New Approach	Aug 31, 2020	Data AugmentationImage Classification	CodeCode Available	4
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise	Dec 5, 2024	DenoisingImage Restoration	CodeCode Available	4
A Survey on Large Language Models for Recommendation	May 31, 2023	Recommendation Systems	CodeCode Available	4
GigaAM: Efficient Self-Supervised Learner for Speech Recognition	Jun 1, 2025	Automatic Speech RecognitionLanguage Modeling	CodeCode Available	4
Multimodal Whole Slide Foundation Model for Pathology	Nov 29, 2024	Cross-Modal Retrievalmodel	CodeCode Available	4
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	May 27, 2022	Image ClassificationInstance Segmentation	CodeCode Available	4
TSLANet: Rethinking Transformers for Time Series Representation Learning	Apr 12, 2024	Anomaly DetectionComputational Efficiency	CodeCode Available	3
STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes	Dec 31, 2024	Dynamic ReconstructionScene Flow Estimation	CodeCode Available	3
The T05 System for The VoiceMOS Challenge 2024: Transfer Learning from Deep Image Classifier to Naturalness MOS Prediction of High-Quality Synthetic Speech	Sep 14, 2024	Self-Supervised LearningTransfer Learning	CodeCode Available	3
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining	Mar 23, 2025	3DGSBenchmarking	CodeCode Available	3
Robust and Efficient Medical Imaging with Self-Supervision	May 19, 2022	DiagnosticRepresentation Learning	CodeCode Available	3
MuQ: Self-Supervised Music Representation Learning with Mel Residual Vector Quantization	Jan 2, 2025	Contrastive LearningKey Detection	CodeCode Available	3
SARATR-X: Toward Building A Foundation Model for SAR Target Recognition	May 15, 2024	2D Object DetectionEarth Observation	CodeCode Available	3
Spikformer V2: Join the High Accuracy Club on ImageNet with an SNN Ticket	Jan 4, 2024	image-classificationImage Classification	CodeCode Available	3
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis	Feb 27, 2024	Contrastive LearningMedical Image Analysis	CodeCode Available	3
Calibre: Towards Fair and Accurate Personalized Federated Learning with Self-Supervised Learning	Dec 28, 2024	FairnessFederated Learning	CodeCode Available	3
Moving Object Segmentation: All You Need Is SAM (and Flow)	Apr 18, 2024	AllMotion Segmentation	CodeCode Available	3
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D	Apr 19, 2025	DecoderObject Localization	CodeCode Available	3
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data	Feb 2, 2024	Contrastive LearningDescriptive	CodeCode Available	3
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	Mar 20, 2024	Aerial Scene ClassificationBuilding change detection for remote sensing images	CodeCode Available	3
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs	Jan 11, 2024	Representation LearningSelf-Supervised Learning	CodeCode Available	3
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models	Jan 30, 2024	Self-Supervised LearningSpeaker Recognition	CodeCode Available	3
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation	Dec 23, 2023	Emotion RecognitionSelf-Supervised Learning	CodeCode Available	3
Accelerating Goal-Conditioned RL Algorithms and Research	Aug 20, 2024	GPUreinforcement-learning	CodeCode Available	3
Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models	Jun 23, 2025	Domain AdaptationGPU	CodeCode Available	3
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training	May 14, 2024	Data AugmentationSelf-Supervised Learning	CodeCode Available	3
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach	May 24, 2024	ClusteringSelf-Supervised Learning	CodeCode Available	3
Leveraging Self-Supervised Learning for Speaker Diarization	Sep 14, 2024	Self-Supervised Learningspeaker-diarization	CodeCode Available	3
EAT: Self-Supervised Pre-Training with Efficient Audio Transformer	Jan 7, 2024	Audio ClassificationSelf-Supervised Learning	CodeCode Available	3
Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks	Mar 30, 2023	Human ParsingPedestrian Attribute Recognition	CodeCode Available	3
EEGPT: Pretrained Transformer for Universal and Reliable Representation of EEG Signals	Jan 1, 2024	EEGRepresentation Learning	CodeCode Available	3
Emergence of Segmentation with Minimalistic White-Box Transformers	Aug 30, 2023	SegmentationSelf-Supervised Learning	CodeCode Available	3
How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model	Apr 15, 2024	DecoderImage Segmentation	CodeCode Available	3
Pushing the limits of raw waveform speaker recognition	Mar 16, 2022	Self-Supervised LearningSpeaker Recognition	CodeCode Available	3
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders	Jan 2, 2023	Object DetectionRepresentation Learning	CodeCode Available	3
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations	Jun 20, 2020	QuantizationSelf-Supervised Learning	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 101Next →

All datasets DABS STL-10 CIFAR10 cifar100 ImageNet-100 (TEMI Split)TinyImageNet CIFAR-10 CIFAR-100 CREMA-D Tiny ImageNet

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Pretraining: None	Images & Text	57.5	—	Unverified
2	Pretraining: ShED	Images & Text	54.3	—	Unverified
3	Pretraining: e-Mix	Images & Text	48.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	Accuracy	91.7	—	Unverified
2	ResNet18	Accuracy	91.02	—	Unverified
3	MV-MR	Accuracy	89.67	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	93.89	—	Unverified
2	ResNet18	average top-1 classification accuracy	92.58	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	72.51	—	Unverified
2	ResNet18	average top-1 classification accuracy	69.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet50)	Top-1 Accuracy	82.64	—	Unverified
2	CorInfomax (ResNet18)	Top-1 Accuracy	80.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	51.84	—	Unverified
2	ResNet18	average top-1 classification accuracy	51.67	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet18)	Top-1 Accuracy	93.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet18)	Top-1 Accuracy	71.61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Hybrid BYOL-S/CvT	Accuracy	67.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet50)	Top-1 Accuracy	54.86	—	Unverified