Masked Siamese Networks for Label-Efficient Learning

2022-04-14Code Available2· sign in to hype

Mahmoud Assran, Mathilde Caron, Ishan Misra, Piotr Bojanowski, Florian Bordes, Pascal Vincent, Armand Joulin, Michael Rabbat, Nicolas Ballas

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/facebookresearch/msn
OfficialIn paperpytorch★ 464

Abstract

We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.

Tasks

image-classification Image Classification Self-Supervised Image Classification Self-Supervised Learning Semi-Supervised Image Classification

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet - 1% labeled data	MSN (ViT-B/4)	Top 1 Accuracy	75.7	—	Unverified

Masked Siamese Networks for Label-Efficient Learning

Code

Abstract

Tasks

Benchmark Results

Reproductions