SOTAVerified

Self-Supervised Image Classification

This is the task of image classification using representations learnt with self-supervised learning. Self-supervised methods generally involve a pretext task that is solved to learn a good representation and a loss function to learn with. One example of a loss function is an autoencoder based loss where the goal is reconstruction of an image pixel-by-pixel. A more popular recent example is a contrastive loss, which measure the similarity of sample pairs in a representation space, and where there can be a varying target instead of a fixed target to reconstruct (as in the case of autoencoders).

A common evaluation protocol is to train a linear classifier on top of (frozen) representations learnt by self-supervised methods. The leaderboards for the linear evaluation protocol can be found below. In practice, it is more common to fine-tune features on a downstream task. An alternative evaluation protocol therefore uses semi-supervised learning and finetunes on a % of the labels. The leaderboards for the finetuning protocol can be accessed here.

You may want to read some blog posts before reading the papers and checking the leaderboards:

Contrastive Self-Supervised Learning - Ankesh Anand
The Illustrated Self-Supervised Learning - Amit Chaudhary
Self-supervised learning and computer vision - Jeremy Howard
Self-Supervised Representation Learning - Lilian Weng

There is also Yann LeCun's talk at AAAI-20 which you can watch here (35:00+).

( Image credit: A Simple Framework for Contrastive Learning of Visual Representations )

Title	Date	Tasks	Status	Hype
Vision Transformers Need Registers	Sep 28, 2023	Object DiscoverySelf-Supervised Image Classification	CodeCode Available	6
DINOv2: Learning Robust Visual Features without Supervision	Apr 14, 2023	Depth EstimationDomain Generalization	CodeCode Available	6
Multi-label Cluster Discrimination for Visual Representation Learning	Jul 24, 2024	Contrastive LearningImage-text Retrieval	CodeCode Available	4
Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN	May 27, 2022	Image ClassificationInstance Segmentation	CodeCode Available	4
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities	May 18, 2023	1 Image, 2*2 StitchiAction Classification	CodeCode Available	3
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling	Jan 9, 2023	2D Object DetectionContrastive Learning	CodeCode Available	3
XCiT: Cross-Covariance Image Transformers	Jun 17, 2021	image-classificationImage Classification	CodeCode Available	3
Momentum Contrast for Unsupervised Visual Representation Learning	Nov 13, 2019	Contrastive LearningImage Classification	CodeCode Available	3
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective	Oct 16, 2024	Conditional Image GenerationImage Generation	CodeCode Available	2
Unicom: Universal and Compact Representation Learning for Image Retrieval	Apr 12, 2023	Image ClassificationImage Retrieval	CodeCode Available	2

Title

Status

Hype

Vision Transformers Need Registers

CodeCode Available

DINOv2: Learning Robust Visual Features without Supervision

CodeCode Available

Multi-label Cluster Discrimination for Visual Representation Learning

CodeCode Available

Architecture-Agnostic Masked Image Modeling -- From ViT back to CNN

CodeCode Available

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities