R1-Onevision：An Open-Source Multimodal Large Language Model Capable of Deep Reasoning

2025-02-24ongoing 2025Code Available4· sign in to hype

Yi Yang*, Xiaoxuan He*, Hongkun Pan*, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Minfeng Zhu†, Bo Zhang†, Wei Chen†

Code Available — Be the first to reproduce this paper.

Code

github.com/Fancy-MLLM/R1-onevision
pytorch★ 577

Abstract

R1-OneVision is a versatile multimodal reasoning large model, designed to tackle complex visual reasoning tasks. It seamlessly integrates visual and textual data to offer precise interpretations of multimodal information, excelling in areas such as mathematics, science, deep image understanding, and logical reasoning. With its robust ability to perform multimodal reasoning, R1-OneVision emerges as a powerful AI assistant capable of addressing a wide range of problem-solving challenges across different domains.

Tasks

Language Modeling Language Modelling Large Language Model Logical Reasoning Multimodal Large Language Model Multimodal Reasoning Visual Reasoning

R1-Onevision：An Open-Source Multimodal Large Language Model Capable of Deep Reasoning

Code

Abstract

Tasks

Reproductions