PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

2024-05-23Code Available1· sign in to hype

Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon

arXiv PDF

Code Available — Be the first to reproduce this paper.

Reproduce

Code

github.com/sony/pagoda
OfficialIn paperpytorch★ 22

Abstract

The diffusion model performs remarkable in generating high-dimensional content but is computationally intensive, especially during training. We propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a novel pipeline that reduces the training costs through three stages: training diffusion on downsampled data, distilling the pretrained diffusion, and progressive super-resolution. With the proposed pipeline, PaGoDA achieves a 64 reduced cost in training its diffusion model on 8x downsampled data; while at the inference, with the single-step, it performs state-of-the-art on ImageNet across all resolutions from 64x64 to 512x512, and text-to-image. PaGoDA's pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models (e.g., Stable Diffusion). The code is available at https://github.com/sony/pagoda.

Tasks

Decoder Image Generation Super-Resolution

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet 128x128	PaGoDA	FID	1.48	—	Unverified
ImageNet 256x256	PaGoDA	FID	1.56	—	Unverified
ImageNet 32x32	PaGoDA	FID	0.79	—	Unverified
ImageNet 512x512	PaGoDA	FID	1.8	—	Unverified
ImageNet 64x64	PaGoDA	FID	1.21	—	Unverified

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

Code

Abstract

Tasks

Benchmark Results

Reproductions