Generative Pretraining from Pixels

2020-07-17ICML 2020Code Available2· sign in to hype

Mark Chen, Alec Radford, Rewon Child, Jeff Wu, Heewoo Jun, Prafulla Dhariwal, David Luan, Ilya Sutskever

Code Available — Be the first to reproduce this paper.

Code

github.com/openai/image-gpt
Officialtf★ 2,088
github.com/teddykoker/image-gpt
pytorch★ 260
github.com/apeguero1/image-gpt/blob/master/Transformers_Image_GPT.ipynb
tf★ 0
github.com/EugenHotaj/pytorch-generative/blob/master/pytorch_generative/models/autoregressive/image_gpt.py
pytorch★ 0

Abstract

Inspired by progress in unsupervised representation learning for natural language, we examine whether similar models can learn useful representations for images. We train a sequence Transformer to auto-regressively predict pixels, without incorporating knowledge of the 2D input structure. Despite training on low-resolution ImageNet without labels, we find that a GPT-2 scale model learns strong image representations as measured by linear probing, fine-tuning, and low-data classification. On CIFAR-10, we achieve 96.3% accuracy with a linear probe, outperforming a supervised Wide ResNet, and 99.0% accuracy with full finetuning, matching the top supervised pre-trained models. An even larger model trained on a mixture of ImageNet and web images is competitive with self-supervised benchmarks on ImageNet, achieving 72.0% top-1 accuracy on a linear probe of our features.

Tasks

Image Classification Representation Learning Self-Supervised Image Classification

Generative Pretraining from Pixels

Code

Abstract

Tasks

Reproductions