Autoregressive Image Generation with Randomized Parallel Decoding

2025-03-13Code Available2· sign in to hype

Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang

Code Available — Be the first to reproduce this paper.

Code

github.com/hp-l33/ARPG
Officialpytorch★ 88

Abstract

We introduce ARPG, a novel visual autoregressive model that enables randomized parallel generation, addressing the inherent limitations of conventional raster-order approaches, which hinder inference efficiency and zero-shot generalization due to their sequential, predefined token generation order. Our key insight is that effective random-order modeling necessitates explicit guidance for determining the position of the next predicted token. To this end, we propose a novel guided decoding framework that decouples positional guidance from content representation, encoding them separately as queries and key-value pairs. By directly incorporating this guidance into the causal attention mechanism, our approach enables fully random-order training and generation, eliminating the need for bidirectional attention. Consequently, ARPG readily generalizes to zero-shot tasks such as image inpainting, outpainting, and resolution expansion. Furthermore, it supports parallel inference by concurrently processing multiple queries using a shared KV cache. On the ImageNet-1K 256 benchmark, our approach attains an FID of 1.94 with only 64 sampling steps, achieving over a 20-fold increase in throughput while reducing memory consumption by over 75% compared to representative recent autoregressive models at a similar scale.

Tasks

Conditional Image Generation Image Generation Image Inpainting Zero-shot Generalization

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet 256x256	ARPG-XXL	FID	1.94	—	Unverified
ImageNet 256x256	ARPG-XL	FID	2.1	—	Unverified
ImageNet 256x256	ARPG-L	FID	2.44	—	Unverified

Autoregressive Image Generation with Randomized Parallel Decoding

Code

Abstract

Tasks

Benchmark Results

Reproductions