SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 451500 of 698 papers

TitleStatusHype
Modelling Working Memory using Deep Recurrent Reinforcement Learning0
Modularity Matters: Learning Invariant Relational Reasoning Tasks0
Modulated Self-attention Convolutional Network for VQA0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
Multi-Granularity Modularized Network for Abstract Visual Reasoning0
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning0
Superpixel Semantics Representation and Pre-training for Vision-Language Task0
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners0
Navigating to Objects Specified by Images0
Neural-guided, Bidirectional Program Search for Abstraction and Reasoning0
Neural Structure Mapping For Learning Abstract Visual Analogies0
Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation0
Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning"0
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning0
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks0
Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks0
NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models0
Attention over learned object embeddings enables complex visual reasoning0
Object-Centric Diagnosis of Visual Reasoning0
Object Ordering with Bidirectional Matchings for Visual Reasoning0
OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning0
OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning0
On Data Synthesis and Post-training for Visual Abstract Reasoning0
One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning0
Question Guided Modular Routing Networks for Visual Question Answering0
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing0
RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs0
Reason from Context with Self-supervised Learning0
Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems0
Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge0
Recurrent Vision Transformer for Solving Visual Reasoning Problems0
Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models0
Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models0
Retrieving and Highlighting Action with Spatiotemporal Reference0
Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities0
RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models0
Robust Visual Reasoning via Language Guided Neural Module Networks0
Same-different problems strain convolutional neural networks0
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems0
Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models0
Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories0
SelfEval: Leveraging the discriminative nature of generative models for evaluation0
Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering0
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI0
SHOP-VRB: A Visual Reasoning Benchmark for Object Perception0
Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study0
Simple Token-Level Confidence Improves Caption Correctness0
Slow Perception: Let's Perceive Geometric Figures Step-by-step0
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection0
Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence0
Show:102550
← PrevPage 10 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4CoCaAccuracy87Unverified
5X2-VLM (base)Accuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1Dynamics-Aware DQNAUCCESS85.2Unverified
2RPINAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified