MMC: Advancing Multimodal Chart Understanding with Large-scale Instruction Tuning
Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/fuxiaoliu/mmcOfficialIn paperpytorch★ 95
- github.com/tianyi-lab/hallusionbenchnone★ 335
- github.com/FuxiaoLiu/LRV-Instructionpytorch★ 296
- github.com/FuxiaoLiu/VisualNews-Repositorypytorch★ 105
- github.com/yangyucheng000/papercode-2/tree/main/mm_MindSpore-mainmindspore★ 0
- github.com/yangyucheng000/Paper-4/tree/main/mm_MindSporemindspore★ 0
Abstract
With the rapid development of large language models (LLMs) and their integration into large multimodal models (LMMs), there has been impressive progress in zero-shot completion of user-oriented vision-language tasks. However, a gap remains in the domain of chart image understanding due to the distinct abstract components in charts. To address this, we introduce a large-scale MultiModal Chart Instruction (MMC-Instruction) dataset comprising 600k instances supporting diverse tasks and chart types. Leveraging this data, we develop MultiModal Chart Assistant (MMCA), an LMM that achieves state-of-the-art performance on existing chart QA benchmarks. Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (MMC-Benchmark), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts. Extensive experiments on MMC-Benchmark reveal the limitations of existing LMMs on correctly interpreting charts, even for the most recent GPT-4V model. Our work provides an instruction-tuning methodology and benchmark to advance multimodal understanding of charts. Code and data are available at https://github.com/FuxiaoLiu/MMC.