SOTAVerified

R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep Reasoning

2025-02-24ongoing 2025Code Available4· sign in to hype

Yi Yang*, Xiaoxuan He*, Hongkun Pan*, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Minfeng Zhu†, Bo Zhang†, Wei Chen†

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

R1-OneVision is a versatile multimodal reasoning large model, designed to tackle complex visual reasoning tasks. It seamlessly integrates visual and textual data to offer precise interpretations of multimodal information, excelling in areas such as mathematics, science, deep image understanding, and logical reasoning. With its robust ability to perform multimodal reasoning, R1-OneVision emerges as a powerful AI assistant capable of addressing a wide range of problem-solving challenges across different domains.

Tasks

Reproductions