Autoregressive Image Generation with Randomized Parallel Decoding
Haopeng Li, Jinyue Yang, Guoqi Li, Huan Wang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/hp-l33/ARPGOfficialpytorch★ 88
Abstract
We introduce ARPG, a novel visual autoregressive model that enables randomized parallel generation, addressing the inherent limitations of conventional raster-order approaches, which hinder inference efficiency and zero-shot generalization due to their sequential, predefined token generation order. Our key insight is that effective random-order modeling necessitates explicit guidance for determining the position of the next predicted token. To this end, we propose a novel guided decoding framework that decouples positional guidance from content representation, encoding them separately as queries and key-value pairs. By directly incorporating this guidance into the causal attention mechanism, our approach enables fully random-order training and generation, eliminating the need for bidirectional attention. Consequently, ARPG readily generalizes to zero-shot tasks such as image inpainting, outpainting, and resolution expansion. Furthermore, it supports parallel inference by concurrently processing multiple queries using a shared KV cache. On the ImageNet-1K 256 benchmark, our approach attains an FID of 1.94 with only 64 sampling steps, achieving over a 20-fold increase in throughput while reducing memory consumption by over 75% compared to representative recent autoregressive models at a similar scale.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| ImageNet 256x256 | ARPG-XXL | FID | 1.94 | — | Unverified |
| ImageNet 256x256 | ARPG-XL | FID | 2.1 | — | Unverified |
| ImageNet 256x256 | ARPG-L | FID | 2.44 | — | Unverified |