Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

2024-09-06Code Available4· sign in to hype

Zhuoyan Luo, Fengyuan Shi, Yixiao Ge, Yujiu Yang, LiMin Wang, Ying Shan

Code Available — Be the first to reproduce this paper.

Code

github.com/tencentarc/seed-voken
OfficialIn paperpytorch★ 999
github.com/tencentarc/open-magvit2
pytorch★ 999

Abstract

We present Open-MAGVIT2, a family of auto-regressive image generation models ranging from 300M to 1.5B. The Open-MAGVIT2 project produces an open-source replication of Google's MAGVIT-v2 tokenizer, a tokenizer with a super-large codebook (i.e., 2^18 codes), and achieves the state-of-the-art reconstruction performance (1.17 rFID) on ImageNet 256 256. Furthermore, we explore its application in plain auto-regressive models and validate scalability properties. To assist auto-regressive models in predicting with a super-large vocabulary, we factorize it into two sub-vocabulary of different sizes by asymmetric token factorization, and further introduce "next sub-token prediction" to enhance sub-token interaction for better generation quality. We release all models and codes to foster innovation and creativity in the field of auto-regressive visual generation.

Tasks

Image Generation Image Reconstruction

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
ImageNet 256x256	Open-MAGVIT2-XL	FID	2.33	—	Unverified

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Code

Abstract

Tasks

Benchmark Results

Reproductions