mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

2023-11-07CVPR 2024Code Available4· sign in to hype

Qinghao Ye, Haiyang Xu, Jiabo Ye, Ming Yan, Anwen Hu, Haowei Liu, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

Code Available — Be the first to reproduce this paper.

Code

github.com/x-plug/mplug-owl
OfficialIn paperpytorch★ 2,539
github.com/X-PLUG/mPLUG-Owl/tree/main/mPLUG-Owl2
Officialpytorch★ 0

Abstract

Multi-modal Large Language Models (MLLMs) have demonstrated impressive instruction abilities across various open-ended tasks. However, previous methods primarily focus on enhancing multi-modal capabilities. In this work, we introduce a versatile multi-modal large language model, mPLUG-Owl2, which effectively leverages modality collaboration to improve performance in both text and multi-modal tasks. mPLUG-Owl2 utilizes a modularized network design, with the language decoder acting as a universal interface for managing different modalities. Specifically, mPLUG-Owl2 incorporates shared functional modules to facilitate modality collaboration and introduces a modality-adaptive module that preserves modality-specific features. Extensive experiments reveal that mPLUG-Owl2 is capable of generalizing both text tasks and multi-modal tasks and achieving state-of-the-art performances with a single generic model. Notably, mPLUG-Owl2 is the first MLLM model that demonstrates the modality collaboration phenomenon in both pure-text and multi-modal scenarios, setting a pioneering path in the development of future multi-modal foundation models.

Tasks

1 Image, 2*2 Stitching Decoder Language Modeling Language Modelling Large Language Model Long-Context Understanding Visual Question Answering Visual Question Answering (VQA)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
MMNeedle	mPLUG-Owl-v2	1 Image, 4*4 Stitching, Exact Accuracy	0.3	—	Unverified

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration

Code

Abstract

Tasks

Benchmark Results

Reproductions