Autoregressive Image Generation using Residual Quantization

2022-03-03CVPR 2022Code Available3· sign in to hype

Doyup Lee, Chiheon Kim, Saehoon Kim, Minsu Cho, Wook-Shin Han

Code Available — Be the first to reproduce this paper.

Code

github.com/kakaobrain/rq-vae-transformer
OfficialIn paperpytorch★ 1,010
github.com/ai-forever/movqgan
pytorch★ 264
github.com/archinetai/bitcodes-pytorch
pytorch★ 6
github.com/lucidrains/magvit2-pytorch
pytorch★ 0

Abstract

For autoregressive (AR) modeling of high-resolution images, vector quantization (VQ) represents an image as a sequence of discrete codes. A short sequence length is important for an AR model to reduce its computational costs to consider long-range interactions of codes. However, we postulate that previous VQ cannot shorten the code sequence and generate high-fidelity images together in terms of the rate-distortion trade-off. In this study, we propose the two-stage framework, which consists of Residual-Quantized VAE (RQ-VAE) and RQ-Transformer, to effectively generate high-resolution images. Given a fixed codebook size, RQ-VAE can precisely approximate a feature map of an image and represent the image as a stacked map of discrete codes. Then, RQ-Transformer learns to predict the quantized feature vector at the next position by predicting the next stack of codes. Thanks to the precise approximation of RQ-VAE, we can represent a 256256 image as 88 resolution of the feature map, and RQ-Transformer can efficiently reduce the computational costs. Consequently, our framework outperforms the existing AR models on various benchmarks of unconditional and conditional image generation. Our approach also has a significantly faster sampling speed than previous AR models to generate high-quality images.

Tasks

Conditional Image Generation Image Generation Image Reconstruction Quantization Text-to-Image Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet 256x256	RQ-Transformer	FID	3.83	—	Unverified

Autoregressive Image Generation using Residual Quantization

Code

Abstract

Tasks

Benchmark Results

Reproductions