Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

2026-03-16Unverified0· sign in to hype

Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Yufeng Zhong, Lin Ma

Unverified — Be the first to reproduce this paper.

Abstract

Chart reasoning presents unique challenges due to its inherent complexity -- requiring precise numerical comprehension, multi-level visual understanding, and logical inference across interconnected data elements. Existing vision-language models often struggle with such reasoning tasks, particularly when handling multi-subchart scenarios and numerical sensitivity. To address these challenges, we introduce Chart-R1, a chart-domain vision-language model that leverages reinforcement fine-tuning for advanced chart reasoning. We first propose a programmatic data synthesis approach to generate high-quality step-by-step reasoning data with verifiable answer formats, covering diverse chart types and complexity levels. Our two-stage training strategy includes: (1) Chart-COT, which decomposes complex reasoning into interpretable subtasks through chain-of-thought supervision, and (2) Chart-RFT, which employs group relative policy optimization with numerically sensitive rewards tailored for chart-specific reasoning. Experiments on open-source benchmarks and our proposed ChartRQA dataset demonstrate that Chart-R1 significantly outperforms existing chart-domain methods and rivals large-scale open/closed-source models.

Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner

Abstract

Reproductions