SOTAVerified

Multimodal Reasoning

Reasoning over multimodal inputs.

Papers

Showing 101110 of 302 papers

TitleStatusHype
Learning Compact Vision Tokens for Efficient Large Multimodal ModelsCode1
How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape GameCode1
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?Code1
MM-BigBench: Evaluating Multimodal Models on Multimodal Content Comprehension TasksCode1
LLMs can be Dangerous Reasoners: Analyzing-based Jailbreak Attack on Large Language ModelsCode1
Fine-Grained Visual EntailmentCode1
CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal ModelsCode1
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic CognitionCode1
CutPaste&Find: Efficient Multimodal Hallucination Detector with Visual-aid Knowledge Base0
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning0
Show:102550
← PrevPage 11 of 31Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VAccuracy24Unverified
2Gemini ProAccuracy13.2Unverified
3LLaVa-1.5-13BAccuracy1.8Unverified
4LLaVa-1.5-7BAccuracy1.5Unverified
5BLIP2-FLAN-T5-XXLAccuracy0.9Unverified
6QWENAccuracy0.9Unverified
7CogVLMAccuracy0.9Unverified
8InstructBLIPAccuracy0.6Unverified
#ModelMetricClaimedVerifiedStatus
1GPT4VAccuracy22.76Unverified
2Gemini ProAccuracy17.66Unverified
3Qwen-VL-MaxAccuracy15.59Unverified
4InternLM-XComposer2-VLAccuracy14.54Unverified
#ModelMetricClaimedVerifiedStatus
1GPT-4Acc30.3Unverified