SOTAVerified

Multimodal Reasoning

Reasoning over multimodal inputs.

Papers

Showing 126150 of 302 papers

TitleStatusHype
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning0
FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models0
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning0
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts0
ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering0
Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency0
KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations0
Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations0
Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation0
MuSciClaims: Multimodal Scientific Claim Verification0
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning0
RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought0
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos0
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning0
GThinker: Towards General Multimodal Reasoning via Cue-Guided RethinkingCode0
MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM0
Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation0
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought0
Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model0
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning0
MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence CalibrationCode0
Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios0
VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL0
Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models0
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning0
Show:102550
← PrevPage 6 of 13Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VAccuracy24Unverified
2Gemini ProAccuracy13.2Unverified
3LLaVa-1.5-13BAccuracy1.8Unverified
4LLaVa-1.5-7BAccuracy1.5Unverified
5BLIP2-FLAN-T5-XXLAccuracy0.9Unverified
6QWENAccuracy0.9Unverified
7CogVLMAccuracy0.9Unverified
8InstructBLIPAccuracy0.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4VAccuracy22.76Unverified
2Gemini ProAccuracy17.66Unverified
3Qwen-VL-MaxAccuracy15.59Unverified
4InternLM-XComposer2-VLAccuracy14.54Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Acc30.3Unverified