SOTAVerified

Multimodal Reasoning

Reasoning over multimodal inputs.

Papers

Showing 201210 of 302 papers

TitleStatusHype
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning0
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving0
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual HallucinationCode1
LLaVA-CoT: Let Vision Language Models Reason Step-by-StepCode7
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization0
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level0
Towards Low-Resource Harmful Meme Detection with LMM AgentsCode0
Distill Visual Chart Reasoning Ability from LLMs to MLLMsCode2
Understanding the Role of LLMs in Multimodal Evaluation BenchmarksCode0
Learning to Ground VLMs without Forgetting0
Show:102550
← PrevPage 21 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VAccuracy24Unverified
2Gemini ProAccuracy13.2Unverified
3LLaVa-1.5-13BAccuracy1.8Unverified
4LLaVa-1.5-7BAccuracy1.5Unverified
5BLIP2-FLAN-T5-XXLAccuracy0.9Unverified
6QWENAccuracy0.9Unverified
7CogVLMAccuracy0.9Unverified
8InstructBLIPAccuracy0.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4VAccuracy22.76Unverified
2Gemini ProAccuracy17.66Unverified
3Qwen-VL-MaxAccuracy15.59Unverified
4InternLM-XComposer2-VLAccuracy14.54Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Acc30.3Unverified