SOTAVerified

Self-Supervised Image Classification

This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).

A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.

You may want to read some blog posts before reading the papers and checking the leaderboards:

There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).

( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )

Papers

Showing 150 of 110 papers

TitleStatusHype
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified PerspectiveCode2
SynCo: Synthetic Hard Negatives in Contrastive Learning for Better Unsupervised Visual RepresentationsCode0
Unsupervised Representation Learning by Balanced Self Attention MatchingCode0
Multi-label Cluster Discrimination for Visual Representation LearningCode4
Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images0
IPCL: Iterative Pseudo-Supervised Contrastive Learning to Improve Self-Supervised Feature RepresentationCode0
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained RepresentationsCode1
Perceptual Group Tokenizer: Building Perception with Iterative Grouping0
Vision Transformers Need RegistersCode6
Masked Image Residual Learning for Scaling Deeper Vision TransformersCode0
DINO-CXR: A self supervised method based on vision transformer for chest X-ray classification0
Masking meets Supervision: A Strong Learning AllianceCode1
ONE-PEACE: Exploring One General Representation Model Toward Unlimited ModalitiesCode3
Contrastive Tuning: A Little Help to Make Masked Autoencoders ForgetCode1
DINOv2: Learning Robust Visual Features without SupervisionCode6
Unicom: Universal and Compact Representation Learning for Image RetrievalCode2
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue DistributionCode1
MV-MR: multi-views and multi-representations for self-supervised learning and knowledge distillationCode0
All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy ReductionCode0
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked ModelingCode3
Learning by Sorting: Self-supervised Learning with Group Ordering ConstraintsCode1
Improving Visual Representation Learning through Perceptual UnderstandingCode0
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle0
EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleCode0
Towards Sustainable Self-supervised LearningCode1
Exploring Target Representations for Masked AutoencodersCode0
BEiT v2: Masked Image Modeling with Vector-Quantized Visual TokenizersCode0
Model-Aware Contrastive Learning: Towards Escaping the DilemmasCode0
Bootstrapped Masked Autoencoders for Vision BERT PretrainingCode1
Unsupervised Visual Representation Learning by Synchronous Momentum GroupingCode0
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNNCode4
Multiplexed Immunofluorescence Brain Image Analysis Using Self-Supervised Dual-Loss Adaptive Masked AutoencoderCode1
Masked Siamese Networks for Label-Efficient LearningCode2
mc-BEiT: Multi-choice Discretization for Image BERT Pre-trainingCode1
Mugs: A Multi-Granular Self-Supervised Learning FrameworkCode1
CaCo: Both Positive and Negative Samples are Directly Learnable via Cooperative-adversarial Contrastive LearningCode1
Weak Augmentation Guided Relational Self-Supervised LearningCode1
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without SupervisionCode0
Context Autoencoder for Self-Supervised Representation LearningCode2
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning FrameworkCode0
When Do Flat Minima Optimizers Work?Code1
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?Code0
Max-Margin Contrastive LearningCode1
Masked Feature Prediction for Self-Supervised Visual Pre-TrainingCode1
Similarity Contrastive Estimation for Self-Supervised Soft Contrastive LearningCode1
PeCo: Perceptual Codebook for BERT Pre-training of Vision TransformersCode1
SimMIM: A Simple Framework for Masked Image ModelingCode1
iBOT: Image BERT Pre-Training with Online TokenizerCode1
Masked Autoencoders Are Scalable Vision LearnersCode1
Self-Supervised Learning by Estimating Twin Class DistributionsCode1
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.