MaskGIT: Masked Generative Image Transformer

2022-02-08CVPR 2022Code Available3· sign in to hype

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

Code Available — Be the first to reproduce this paper.

Code

github.com/google-research/maskgit
Officialjax★ 558
github.com/HKUNLP/Dream
pytorch★ 1,201
github.com/dome272/MaskGIT-pytorch
pytorch★ 471
github.com/valeoai/maskgit-pytorch
pytorch★ 282
github.com/myscience/open-genie
pytorch★ 269
github.com/LAION-AI/phenaki
pytorch★ 220
github.com/alibaba/graph-gpt
pytorch★ 102
github.com/lucidrains/soundstorm-pytorch
pytorch★ 0

Abstract

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation. Our experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x. Besides, we illustrate that MaskGIT can be easily extended to various image editing tasks, such as inpainting, extrapolation, and image manipulation.

Tasks

Decoder Image Generation Image Manipulation Image Outpainting Image Reconstruction Text-to-Image Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet 256x256	MaskGIT	FID	6.18	—	Unverified
ImageNet 256x256	MaskGIT (a=0.05)	FID	4.02	—	Unverified
ImageNet 512x512	MaskGIT (a=0.05)	FID	4.46	—	Unverified
ImageNet 512x512	MaskGIT	FID	7.32	—	Unverified

MaskGIT: Masked Generative Image Transformer

Code

Abstract

Tasks

Benchmark Results

Reproductions