Big Self-Supervised Models are Strong Semi-Supervised Learners
Ting Chen, Simon Kornblith, Kevin Swersky, Mohammad Norouzi, Geoffrey Hinton
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/google-research/simclrOfficialIn papertf★ 4,464
- github.com/lightly-ai/lightlypytorch★ 3,700
- github.com/sayakpaul/PAWS-TFtf★ 45
- github.com/nikheelpandey/TAUP-PyTorchpytorch★ 15
- github.com/nikheelpandey/TAUPpytorch★ 15
- github.com/parkinkon1/simclrtf★ 5
- github.com/mariaauslander/capstone_fall20_irrigationtf★ 3
- github.com/ta9ryuWalrus/simclrtf★ 0
- github.com/serre-lab/prj_selfsuptf★ 0
Abstract
One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet. A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels (13 labeled images per class) using ResNet-50, a 10 improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| ImageNet - 10% labeled data | SimCLRv2 self-distilled (ResNet-152 x3, SK) | Top 1 Accuracy | 80.9 | — | Unverified |
| ImageNet - 10% labeled data | SimCLRv2 distilled (ResNet-50 x2, SK) | Top 1 Accuracy | 80.2 | — | Unverified |
| ImageNet - 10% labeled data | SimCLRv2 (ResNet-152 x3, SK) | Top 1 Accuracy | 80.1 | — | Unverified |
| ImageNet - 10% labeled data | SimCLRv2 distilled (ResNet-50) | Top 1 Accuracy | 77.5 | — | Unverified |
| ImageNet - 10% labeled data | SimCLRv2 (ResNet-50 x2) | Top 1 Accuracy | 73.9 | — | Unverified |
| ImageNet - 10% labeled data | SimCLRv2 (ResNet-50) | Top 1 Accuracy | 68.4 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 self-distilled (ResNet-152 x3, SK) | Top 1 Accuracy | 76.6 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 distilled (ResNet-50 x2, SK) | Top 1 Accuracy | 75.9 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 (ResNet-152 x3, SK) | Top 1 Accuracy | 74.9 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 distilled (ResNet-50) | Top 1 Accuracy | 73.9 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 (ResNet-50 ×2) | Top 1 Accuracy | 66.3 | — | Unverified |
| ImageNet - 1% labeled data | SimCLRv2 (ResNet-50) | Top 1 Accuracy | 57.9 | — | Unverified |