Self-Supervised Learning

Self-Supervised Learning is proposed for utilizing unlabeled data with the success of supervised learning. Producing a dataset with good labels is expensive, while unlabeled data is being generated all the time. The motivation of Self-Supervised Learning is to make use of the large amount of unlabeled data. The main idea of Self-Supervised Learning is to generate the labels from unlabeled data, according to the structure or characteristics of the data itself, and then train on this unsupervised data in a supervised manner. Self-Supervised Learning is wildly used in representation learning to make a model learn the latent features of the data. This technique is often employed in computer vision, video processing and robot control.

Source: Self-supervised Point Set Local Descriptors for Point Cloud Registration

Image source: LeCun

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2701–2750 of 5044 papers

Title	Date	Tasks	Status
Video as the New Language for Real-World Decision Making	Feb 27, 2024	Decision MakingIn-Context Learning	—Unverified
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound	Aug 21, 2024	Audio GenerationAudio Synthesis	—Unverified
Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition	Aug 22, 2018	Action RecognitionActivity Recognition	—Unverified
Video Representation Learning by Recognizing Temporal Transformations	Jul 21, 2020	Action RecognitionRepresentation Learning	—Unverified
Video Transformers: A Survey	Jan 16, 2022	Action ClassificationSelf-Supervised Learning	—Unverified
Video Understanding as Machine Translation	Jun 12, 2020	Machine TranslationMetric Learning	—Unverified
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding	Mar 24, 2025	8kGPU	—Unverified
VieSum: How Robust Are Transformer-based Models on Vietnamese Summarization?	Oct 8, 2021	Abstractive Text SummarizationDecoder	—Unverified
VietASR: Achieving Industry-level Vietnamese ASR with 50-hour labeled data and Large-Scale Speech Pretraining	May 23, 2025	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
ViewMix: Augmentation for Robust Representation in Self-Supervised Learning	Sep 6, 2023	Representation LearningSelf-Supervised Learning	—Unverified
ViewNet: Unsupervised Viewpoint Estimation from Conditional Generation	Dec 1, 2022	Image ReconstructionSelf-Supervised Learning	—Unverified
Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation	May 28, 2024	Representation LearningSelf-Supervised Learning	—Unverified
VIGraph: Generative Self-supervised Learning for Class-Imbalanced Node Classification	Nov 2, 2023	Contrastive LearningNode Classification	—Unverified
Vi-MIX FOR SELF-SUPERVISED VIDEO REPRESENTATION	Sep 29, 2021	Action RecognitionRepresentation Learning	—Unverified
Virtual Node Generation for Node Classification in Sparsely-Labeled Graphs	Sep 12, 2024	Graph LearningMeta-Learning	—Unverified
Visible and infrared self-supervised fusion trained on a single example	Jul 9, 2023	object-detectionObject Detection	—Unverified
Vision-Language Modeling with Regularized Spatial Transformer Networks for All Weather Crosswind Landing of Aircraft	May 9, 2024	AllLanguage Modeling	—Unverified
Vision Learners Meet Web Image-Text Pairs	Jan 17, 2023	BenchmarkingSelf-Supervised Learning	—Unverified
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision	Feb 16, 2022	Action ClassificationAction Recognition	—Unverified
Vision Transformers: State of the Art and Research Challenges	Jul 7, 2022	3D ReconstructionImage Segmentation	—Unverified
Visual Lexicon: Rich Image Features in Language Space	Dec 9, 2024	Image GenerationImage Reconstruction	—Unverified
Visually Guided Self Supervised Learning of Speech Representations	Jan 13, 2020	Emotion RecognitionRepresentation Learning	—Unverified
Visual Representation Learning with Stochastic Frame Prediction	Jun 11, 2024	DecoderPose Tracking	—Unverified
Visuomotor Control in Multi-Object Scenes Using Object-Aware Representations	May 12, 2022	ObjectObject Localization	—Unverified
ViTAR: Vision Transformer with Any Resolution	Mar 27, 2024	Self-Supervised LearningSemantic Segmentation	—Unverified
ViT-TTS: Visual Text-to-Speech with Scalable Diffusion Transformer	May 22, 2023	DecoderDenoising	—Unverified
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning	Jan 1, 2025	Decision Makingreinforcement-learning	—Unverified
VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment	Dec 7, 2023	DisentanglementSelf-Supervised Learning	—Unverified
VRMM: A Volumetric Relightable Morphable Head Model	Feb 6, 2024	3D Face ReconstructionFace Reconstruction	—Unverified
Watching Too Much Television is Good: Self-Supervised Audio-Visual Representation Learning from Movies and TV Shows	Jun 16, 2021	Contrastive LearningRepresentation Learning	—Unverified
Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR	Apr 11, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Wav2Vec-Aug: Improved self-supervised training with limited data	Jun 27, 2022	Data AugmentationSelf-Supervised Learning	—Unverified
Wav2vec-C: A Self-supervised Model for Speech Representation Learning	Mar 9, 2021	QuantizationRepresentation Learning	—Unverified
Wav2vec-Switch: Contrastive Learning from Original-noisy Speech Pairs for Robust Speech Recognition	Oct 11, 2021	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Wavelet-Driven Masked Image Modeling: A Path to Efficient Visual Representation	Mar 2, 2025	Representation LearningSelf-Supervised Learning	—Unverified
WavFT: Acoustic model finetuning with labelled and unlabelled data	Apr 1, 2022	Self-Supervised Learning	—Unverified
Weakly Augmented Variational Autoencoder in Time Series Anomaly Detection	Jan 7, 2024	Anomaly DetectionSelf-Supervised Learning	—Unverified
Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows	Mar 23, 2020	3D human pose and shape estimation3D Human Pose Estimation	—Unverified
Weakly Supervised Class-Agnostic Motion Prediction for Autonomous Driving	Jan 1, 2023	Autonomous Drivingmotion prediction	—Unverified
Weakly-Supervised Speech Pre-training: A Case Study on Target Speech Recognition	May 25, 2023	DenoisingSelf-Supervised Learning	—Unverified
Weakly-Supervised Surgical Phase Recognition	Oct 26, 2023	Few-Shot LearningSelf-Supervised Learning	—Unverified
WeakSTIL: Weak whole-slide image level stromal tumor infiltrating lymphocyte scores are all you need	Sep 13, 2021	AllDecision Making	—Unverified
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation	Dec 15, 2024	Activity Recognitioncross-modal alignment	—Unverified
Wearable-Based Real-time Freezing of Gait Detection in Parkinson's Disease Using Self-Supervised Learning	Oct 8, 2024	Self-Supervised Learning	—Unverified
Wearable data from subjects playing Super Mario, sitting university exams, or performing physical exercise help detect acute mood episodes via self-supervised learning	Nov 7, 2023	Body DetectionEmotion Recognition	—Unverified
WeedCLR: Weed Contrastive Learning through Visual Representations with Class-Optimized Loss in Long-Tailed Datasets	Oct 19, 2023	Contrastive Learningimage-classification	—Unverified
WeedNet: A Foundation Model-Based Global-to-Local AI Approach for Real-Time Weed Species Identification and Classification	May 25, 2025	Self-Supervised Learning	—Unverified
Weighted Ensemble Self-Supervised Learning	Nov 18, 2022	DiversitySelf-Supervised Learning	—Unverified
WeLM: A Well-Read Pre-trained Language Model for Chinese	Sep 21, 2022	Language ModelingLanguage Modelling	—Unverified
WERank: Towards Rank Degradation Prevention for Self-Supervised Learning Using Weight Regularization	Feb 14, 2024	Data AugmentationSelf-Supervised Learning	—Unverified

Show:10 25 50

← PrevPage 55 of 101Next →

All datasets DABS STL-10 CIFAR10 cifar100 ImageNet-100 (TEMI Split)TinyImageNet CIFAR-10 CIFAR-100 CREMA-D Tiny ImageNet

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Pretraining: None	Images & Text	57.5	—	Unverified
2	Pretraining: ShED	Images & Text	54.3	—	Unverified
3	Pretraining: e-Mix	Images & Text	48.9	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	Accuracy	91.7	—	Unverified
2	ResNet18	Accuracy	91.02	—	Unverified
3	MV-MR	Accuracy	89.67	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	93.89	—	Unverified
2	ResNet18	average top-1 classification accuracy	92.58	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	72.51	—	Unverified
2	ResNet18	average top-1 classification accuracy	69.31	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet50)	Top-1 Accuracy	82.64	—	Unverified
2	CorInfomax (ResNet18)	Top-1 Accuracy	80.48	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	ResNet50	average top-1 classification accuracy	51.84	—	Unverified
2	ResNet18	average top-1 classification accuracy	51.67	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet18)	Top-1 Accuracy	93.18	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet18)	Top-1 Accuracy	71.61	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	Hybrid BYOL-S/CvT	Accuracy	67.2	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	CorInfomax (ResNet50)	Top-1 Accuracy	54.86	—	Unverified