SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 601650 of 698 papers

TitleStatusHype
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity0
Towards Explainable Neural-Symbolic Visual Reasoning0
HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model0
HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation0
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning0
VGR: Visual Grounded Reasoning0
Guiding Visual Question Answering with Attention Priors0
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs0
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction0
Abstract Diagrammatic Reasoning with Multiplex Graph Networks0
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks0
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models0
GSON: A Group-based Social Navigation Framework with Large Multimodal Model0
Video Captioning Using Weak Annotation0
Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning0
Impact of ML Optimization Tactics on Greener Pre-Trained ML Models0
Compromising Embodied Agents with Contextual Backdoor Attacks0
Improving Generalization in Visual Reasoning via Self-Ensemble0
Improving Scene Graph Classification by Exploiting Knowledge from Texts0
Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs0
Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning0
INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision0
A Survey on Multimodal Large Language Models0
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning0
Grounded Reinforcement Learning for Visual Reasoning0
Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation0
GRIT: Teaching MLLMs to Think with Images0
Graph Representation for Order-Aware Visual Transformation0
Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
Grammar-Based Grounded Lexicon Learning0
Introduction to Soar0
A survey on knowledge-enhanced multimodal learning0
GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs0
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts0
Iterative Search for Weakly Supervised Semantic Parsing0
Iterative Visual Reasoning Beyond Convolutions0
It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning0
A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law0
Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos0
ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling0
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning0
`Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks0
Just Say the Name: Online Continual Learning with Category Names Only via Data Generation0
A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning0
GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning0
A Review of Emerging Research Directions in Abstract Visual Reasoning0
KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations0
Language-Conditioned Robotic Manipulation with Fast and Slow Thinking0
Language-Guided Salient Object Ranking0
Show:102550
← PrevPage 13 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4X2-VLM (base)Accuracy87Unverified
5CoCaAccuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS85.2Unverified
2Dynamics-Aware DQNAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified