SOTAVerified

Multimodal Reasoning

Reasoning over multimodal inputs.

Papers

Showing 101110 of 302 papers

TitleStatusHype
SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable RewardsCode1
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal ReasoningCode1
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape GameCode1
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual GroundingCode1
Thinking Before Looking: Improving Multimodal LLM Reasoning via Mitigating Visual HallucinationCode1
Fine-Grained Visual EntailmentCode1
VideoMultiAgents: A Multi-Agent Framework for Video Question AnsweringCode1
PACS: A Dataset for Physical Audiovisual CommonSense ReasoningCode1
Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined HighlightsCode0
FiVL: A Framework for Improved Vision-Language AlignmentCode0
Show:102550
← PrevPage 11 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VAccuracy24Unverified
2Gemini ProAccuracy13.2Unverified
3LLaVa-1.5-13BAccuracy1.8Unverified
4LLaVa-1.5-7BAccuracy1.5Unverified
5BLIP2-FLAN-T5-XXLAccuracy0.9Unverified
6QWENAccuracy0.9Unverified
7CogVLMAccuracy0.9Unverified
8InstructBLIPAccuracy0.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4VAccuracy22.76Unverified
2Gemini ProAccuracy17.66Unverified
3Qwen-VL-MaxAccuracy15.59Unverified
4InternLM-XComposer2-VLAccuracy14.54Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Acc30.3Unverified