SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

2023-07-04Code Available2· sign in to hype

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach

Code Available — Be the first to reproduce this paper.

Code

github.com/compvis/fm-boosting
pytorch★ 256
github.com/yuchen413/text2image_safety
pytorch★ 197
github.com/vision-xl/codes
pytorch★ 37
github.com/bytedance/cascadev
pytorch★ 35
github.com/benearnthof/fm_boosting
pytorch★ 7
github.com/wellzline/protip
pytorch★ 6
github.com/andrew-miao/RPO
pytorch★ 5
github.com/tillmannohm/fruit-SALAD
pytorch★ 4

Abstract

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

Tasks

Image Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WISE	stable-diffusion-xl-base-0.9	Overall	0.43	—	Unverified

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Code

Abstract

Tasks

Benchmark Results

Reproductions