Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios

2025-05-29Unverified0· sign in to hype

Linjie Mu, Zhongzhen Huang, Yakun Zhu, Xiangyu Zhao, Shaoting Zhang, Xiaofan Zhang

Unverified — Be the first to reproduce this paper.

Abstract

Effective clinical decision-making depends on iterative, multimodal reasoning across diverse sources of evidence. The recent emergence of multimodal reasoning models has significantly transformed the landscape of solving complex tasks. Although such models have achieved notable success in mathematics and science, their application to medical domains remains underexplored. In this work, we propose MedE^2, a two-stage post-training pipeline that elicits and then enhances multimodal reasoning for medical domains. In Stage-I, we fine-tune models using 2,000 text-only data samples containing precisely orchestrated reasoning demonstrations to elicit reasoning behaviors. In Stage-II, we further enhance the model's reasoning capabilities using 1,500 rigorously curated multimodal medical cases, aligning model reasoning outputs with our proposed multimodal medical reasoning preference. Extensive experiments demonstrate the efficacy and reliability of MedE^2 in improving the reasoning performance of medical multimodal models. Notably, models trained with MedE^2 consistently outperform baselines across multiple medical multimodal benchmarks. Additional validation on larger models and under inference-time scaling further confirms the robustness and practical utility of our approach.

Tasks

Multimodal Reasoning

Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios

Abstract

Tasks

Reproductions