SOTAVerified

Improved Transformer for High-Resolution GANs

2021-06-14NeurIPS 2021Code Available1· sign in to hype

Long Zhao, Zizhao Zhang, Ting Chen, Dimitris N. Metaxas, Han Zhang

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Attention-based models, exemplified by the Transformer, can effectively model long range dependency, but suffer from the quadratic complexity of self-attention operation, making them difficult to be adopted for high-resolution image generation based on Generative Adversarial Networks (GANs). In this paper, we introduce two key ingredients to Transformer to address this challenge. First, in low-resolution stages of the generative process, standard global self-attention is replaced with the proposed multi-axis blocked self-attention which allows efficient mixing of local and global attention. Second, in high-resolution stages, we drop self-attention while only keeping multi-layer perceptrons reminiscent of the implicit neural function. To further improve the performance, we introduce an additional self-modulation component based on cross-attention. The resulting model, denoted as HiT, has a nearly linear computational complexity with respect to the image size and thus directly scales to synthesizing high definition images. We show in the experiments that the proposed HiT achieves state-of-the-art FID scores of 30.83 and 2.95 on unconditional ImageNet 128 128 and FFHQ 256 256, respectively, with a reasonable throughput. We believe the proposed HiT is an important milestone for generators in GANs which are completely free of convolutions. Our code is made publicly available at https://github.com/google-research/hit-gan

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
CelebA 256x256HiT-BFID3.39Unverified
CelebA-HQ 1024x1024HiT-BFID8.83Unverified
FFHQHiT-BFID6.37Unverified
FFHQ 1024 x 1024HiT-BFID6.37Unverified
FFHQ 256 x 256HiT-SFID3.06Unverified
FFHQ 256 x 256HiT-BFID2.95Unverified
FFHQ 256 x 256HiT-LFID2.58Unverified
ImageNet 128x128HiTFID30.83Unverified

Reproductions