SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 501550 of 698 papers

TitleStatusHype
Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions0
Spatial Knowledge Distillation to aid Visual Reasoning0
SwitchCIT: Switching for Continual Instruction Tuning0
Synthetic Visual Genome0
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis0
Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture0
Take A Step Back: Rethinking the Two Stages in Visual Reasoning0
Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning0
TextCaps: a Dataset for Image Captioning with Reading Comprehension0
The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework0
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task0
The role of object-centric representations, guided attention, and external memory on generalizing visual relations0
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models0
Towards A Unified Neural Architecture for Visual Recognition and Reasoning0
Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection0
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models0
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers0
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?0
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection0
Transfer Learning in Visual and Relational Reasoning0
Transformers in Vision: A Survey0
Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends0
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning0
TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering0
TVBench: Redesigning Video-Language Evaluation0
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning0
Understanding the computational demands underlying visual reasoning0
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models0
Unifying Vision-Language Representation Space with Single-tower Transformer0
Grounded Object Centric Learning0
VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena0
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity0
VGR: Visual Grounded Reasoning0
Video Captioning Using Weak Annotation0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning0
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning0
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models0
Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning0
VISREAS: Complex Visual Reasoning with Unanswerable Questions0
Abstract Visual Reasoning Enabled by Language0
Visual Agentic AI for Spatial Reasoning with a Dynamic API0
Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks0
Visual Commonsense based Heterogeneous Graph Contrastive Learning0
Visual Entailment: A Novel Task for Fine-Grained Image Understanding0
Visual In-Context Learning for Large Vision-Language Models0
Visual Language Models show widespread visual deficits on neuropsychological tests0
A Continual Learning Paradigm for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks0
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge0
Visual Question Answering in the Medical Domain0
Show:102550
← PrevPage 11 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4CoCaAccuracy87Unverified
5X2-VLM (base)Accuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dynamics-Aware DQNAUCCESS85.2Unverified
2RPINAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified