PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

2024-01-10Code Available7· sign in to hype

Junsong Chen, Yue Wu, Simian Luo, Enze Xie, Sayak Paul, Ping Luo, Hang Zhao, Zhenguo Li

Code Available — Be the first to reproduce this paper.

Code

github.com/PixArt-alpha/PixArt-alpha
OfficialIn paperpytorch★ 3,283

Abstract

This technical report introduces PIXART- , a text-to-image synthesis framework that integrates the Latent Consistency Model (LCM) and ControlNet into the advanced PIXART- model. PIXART- is recognized for its ability to generate high-quality images of 1024px resolution through a remarkably efficient training process. The integration of LCM in PIXART- significantly accelerates the inference speed, enabling the production of high-quality images in just 2-4 steps. Notably, PIXART- achieves a breakthrough 0.5 seconds for generating 1024x1024 pixel images, marking a 7x improvement over the PIXART- . Additionally, PIXART- is designed to be efficiently trainable on 32GB V100 GPUs within a single day. With its 8-bit inference capability (von Platen et al., 2023), PIXART- can synthesize 1024px images within 8GB GPU memory constraints, greatly enhancing its usability and accessibility. Furthermore, incorporating a ControlNet-like module enables fine-grained control over text-to-image diffusion models. We introduce a novel ControlNet-Transformer architecture, specifically tailored for Transformers, achieving explicit controllability alongside high-quality image generation. As a state-of-the-art, open-source image generation model, PIXART- offers a promising alternative to the Stable Diffusion family of models, contributing significantly to text-to-image synthesis.

Tasks

GPU Image Generation Text-to-Image Generation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
GenEval	PIXART-δ	Overall	0	—	Unverified

PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models

Code

Abstract

Tasks

Benchmark Results

Reproductions