R1-Onevision:An Open-Source Multimodal Large Language Model Capable of Deep Reasoning
2025-02-24ongoing 2025Code Available4· sign in to hype
Yi Yang*, Xiaoxuan He*, Hongkun Pan*, Xiyan Jiang, Yan Deng, Xingtao Yang, Haoyu Lu, Minfeng Zhu†, Bo Zhang†, Wei Chen†
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/Fancy-MLLM/R1-onevisionpytorch★ 577
Abstract
R1-OneVision is a versatile multimodal reasoning large model, designed to tackle complex visual reasoning tasks. It seamlessly integrates visual and textual data to offer precise interpretations of multimodal information, excelling in areas such as mathematics, science, deep image understanding, and logical reasoning. With its robust ability to perform multimodal reasoning, R1-OneVision emerges as a powerful AI assistant capable of addressing a wide range of problem-solving challenges across different domains.