SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 651698 of 698 papers

TitleStatusHype
Language-Vision Planner and Executor for Text-to-Visual Reasoning0
FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving0
From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation0
LaViPlan : Language-Guided Visual Path Planning with RLVR0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning0
From Visual to Acoustic Question Answering0
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models0
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models0
What Makes a Maze Look Like a Maze?0
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning0
From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering0
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration0
Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data0
Learning to Act Properly: Predicting and Explaining Affordances from Images0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
Learning to Collocate Neural Modules for Image Captioning0
Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects0
Learning to Compose and Reason with Language Tree Structures for Visual Grounding0
From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study0
VISREAS: Complex Visual Reasoning with Unanswerable Questions0
Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data0
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios0
Are Disentangled Representations Helpful for Abstract Visual Reasoning?0
Learning to Stop Overthinking at Test Time0
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization0
Abstract Visual Reasoning Enabled by Language0
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering0
Visual Agentic AI for Spatial Reasoning with a Dynamic API0
Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks0
Lexical Conceptual Structure of Literal and Metaphorical Spatial Language: A Case Study of ``Push''0
lilGym: Natural Language Visual Reasoning with Reinforcement Learning0
Filling in the details: Perceiving from low fidelity images0
Few-shot Visual Reasoning with Meta-analogical Contrastive Learning0
Few-shot Subgoal Planning with Language Models0
LLMs Are Not Yet Ready for Deepfake Image Detection0
Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs0
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction0
Few-Shot Abstract Visual Reasoning With Spectral Features0
LOIS: Looking Out of Instance Semantics for Visual Question Answering0
LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception0
Look, Remember and Reason: Grounded reasoning in videos with language models0
An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments0
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation0
Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation0
Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems0
Explicit Knowledge Incorporation for Visual Reasoning0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Show:102550
← PrevPage 14 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4X2-VLM (base)Accuracy87Unverified
5CoCaAccuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS85.2Unverified
2Dynamics-Aware DQNAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified