Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

2025-01-29Code Available11· sign in to hype

Xiaokang Chen, Zhiyu Wu, Xingchao Liu, Zizheng Pan, Wen Liu, Zhenda Xie, Xingkai Yu, Chong Ruan

Code Available — Be the first to reproduce this paper.

Code

github.com/deepseek-ai/janus
OfficialIn paperpytorch★ 17,712

Abstract

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) an optimized training strategy, (2) expanded training data, and (3) scaling to larger model size. With these improvements, Janus-Pro achieves significant advancements in both multimodal understanding and text-to-image instruction-following capabilities, while also enhancing the stability of text-to-image generation. We hope this work will inspire further exploration in the field. Code and models are publicly available.

Tasks

Image Generation Instruction Following Text to Image Generation Text-to-Image Generation Visual Question Answering

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WISE	Janus-pro	Overall	0.35	—	Unverified
WISE	Janus	Overall	0.23	—	Unverified

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

Code

Abstract

Tasks

Benchmark Results

Reproductions