MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

2023-11-15Code Available2· sign in to hype

Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu

Code Available — Be the first to reproduce this paper.

Code

github.com/fuxiaoliu/mmc
OfficialIn paperpytorch★ 95
github.com/tianyi-lab/hallusionbench
none★ 335
github.com/FuxiaoLiu/VisualNews-Repository
pytorch★ 105
github.com/yangyucheng000/papercode-2/tree/main/mm_MindSpore-main
mindspore★ 0
github.com/yangyucheng000/Paper-4/tree/main/mm_MindSpore
mindspore★ 0

Abstract

With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instruction (MMC-Instruction) dataset comprising 600k instances supporting diverse tasks and chart types. Leveraging this data, we develop MultiModal Chart Assistant (MMCA), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks. Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts. Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the most recent GPT-4V model. Our work provides an instruction-tuning methodology and benchmark to advance multimodal understanding of charts. Code and data are available at https://github.com/FuxiaoLiu/MMC.

Tasks

Chart Understanding

MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning

Code

Abstract

Tasks

Reproductions