SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 251300 of 698 papers

TitleStatusHype
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsCode0
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image ScenesCode0
CLEVRER: CoLlision Events for Video REpresentation and ReasoningCode0
Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized methodCode0
Smart Home Appliances: Chat with Your FridgeCode0
Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the WildCode0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
A Distance-preserving Matrix SketchCode0
Slot Abstractors: Toward Scalable Abstract Visual ReasoningCode0
FigureQA: An Annotated Figure Dataset for Visual ReasoningCode0
Stop Pre-Training: Adapt Visual-Language Models to Unseen LanguagesCode0
A Dataset and Architecture for Visual Reasoning with a Working MemoryCode0
RVTBench: A Benchmark for Visual Reasoning TasksCode0
SAViR-T: Spatially Attentive Visual Reasoning with TransformersCode0
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart UnderstandingCode0
Explainable and Explicit Visual Reasoning over Scene GraphsCode0
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual ReasoningCode0
Raven's Progressive Matrices Completion with Latent Gaussian Process PriorsCode0
QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual ReasoningCode0
Predicting Complete 3D Models of Indoor ScenesCode0
Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning TestCode0
A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain GapCode0
Enforcing Consistency in Weakly Supervised Semantic ParsingCode0
Physical Reasoning Using Dynamics-Aware ModelsCode0
Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language ModelsCode0
Cascaded Mutual Modulation for Visual ReasoningCode0
Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based ReasoningCode0
Answer Questions with Right Image Regions: A Visual Attention Regularization ApproachCode0
One Self-Configurable Model to Solve Many Abstract Visual Reasoning ProblemsCode0
On Erroneous Agreements of CLIP Image EmbeddingsCode0
Prompting Large Vision-Language Models for Compositional ReasoningCode0
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creationCode0
Object Level Visual Reasoning in VideosCode0
OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image ReasoningCode0
Bottom-Up Shift and Reasoning for Referring Image SegmentationCode0
Multi-Modal Dialogue State Tracking for Playing GuessWhich GameCode0
Multi-Label Contrastive Learning for Abstract Visual ReasoningCode0
Multi-Label Zero-Shot Learning with Structured Knowledge GraphsCode0
MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal ModelsCode0
Multilevel Hierarchical Network with Multiscale Sampling for Video Question AnsweringCode0
Odd-One-Out Representation LearningCode0
Differentiable Outlier Detection Enable Robust Deep Multimodal AnalysisCode0
A Corpus for Reasoning About Natural Language Grounded in PhotographsCode0
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?Code0
Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoningCode0
KnowZRel: Common Sense Knowledge-based Zero-Shot Relationship Retrieval for Generalised Scene Graph GenerationCode0
Deconfounded Visual GroundingCode0
MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and LearningCode0
MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes BenchmarkCode0
'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) TasksCode0
Show:102550
← PrevPage 6 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4X2-VLM (base)Accuracy87Unverified
5CoCaAccuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS85.2Unverified
2Dynamics-Aware DQNAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified