SOTAVerified

Visual Reasoning

Ability to understand actions and reasoning associated with any visual images

Papers

Showing 401450 of 698 papers

TitleStatusHype
Boosting Cross-task Transferability of Adversarial Patches with Visual Relations0
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language0
Explainable AI And Visual Reasoning: Insights From Radiology0
Navigating to Objects Specified by Images0
Going Beyond Nouns With Vision & Language Models Using Synthetic DataCode1
Your Diffusion Model is Secretly a Zero-Shot ClassifierCode2
Curriculum Learning for Compositional Visual Reasoning0
IRFL: Image Recognition of Figurative LanguageCode1
Equivariant Similarity for Vision-Language Foundation ModelsCode1
NS3D: Neuro-Symbolic Grounding of 3D Objects and RelationsCode1
Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language UnderstandingCode1
Abstract Visual Reasoning: An Algebraic Approach for Solving Raven's Progressive MatricesCode1
3D Concept Learning and Reasoning from Multi-View Images0
Divide and Conquer: Answering Questions with Object Factorization and Compositional ReasoningCode1
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual DescriptionsCode2
Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning0
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creationCode0
Abstract Visual Reasoning Enabled by Language0
Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks0
Learning to reason over visual objectsCode0
Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos0
Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection0
Differentiable Outlier Detection Enable Robust Deep Multimodal AnalysisCode0
Learning to Agree on Vision Attention for Visual Commonsense Reasoning0
Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its ApplicationsCode1
UPop: Unified and Progressive Pruning for Compressing Vision-Language TransformersCode1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsCode4
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding TasksCode0
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual ReasoningCode1
A Divide-Align-Conquer Strategy for Program Synthesis0
Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting0
Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical KnowledgeCode0
Graph Representation for Order-Aware Visual Transformation0
ViLEM: Visual-Language Error Modeling for Image-Text Retrieval0
Unicode Analogies: An Anti-Objectivist Visual Reasoning ChallengeCode0
Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks0
Context-Aware Alignment and Mutual Masking for 3D-Language Pre-TrainingCode1
EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry0
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges0
Cross-modal Attention Congruence Regularization for Vision-Language Relation AlignmentCode1
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?0
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question AnsweringCode1
Position-guided Text Prompt for Vision-Language Pre-trainingCode1
Benchmarking Robustness of Multimodal Image-Text Models under Distribution ShiftCode1
VASR: Visual Analogies of Situation RecognitionCode0
Does Structural Attention Improve Compositional Representations in Vision-Language Models?0
Visual Question Answering From Another Perspective: CLEVR Mental Rotation TestsCode0
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual ReasoningCode1
Abstract Visual Reasoning with Tangram Shapes0
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual RepresentationCode1
Show:102550
← PrevPage 9 of 14Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4o + CAText Score75.5Unverified
2GPT-4V (CoT, pick b/w two options)Text Score75.25Unverified
3GPT-4V (pick b/w two options)Text Score69.25Unverified
4MMICL + CoCoTText Score64.25Unverified
5GPT-4V + CoCoTText Score58.5Unverified
6OpenFlamingo + CoCoTText Score58.25Unverified
7GPT-4VText Score54.5Unverified
8FIBER (EqSim)Text Score51.5Unverified
9FIBER (finetuned, Flickr30k)Text Score51.25Unverified
10MMICL + CCoTText Score51Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy91.51Unverified
2X2-VLM (large)Accuracy88.7Unverified
3XFM (base)Accuracy87.6Unverified
4X2-VLM (base)Accuracy86.2Unverified
5CoCaAccuracy86.1Unverified
6VLMoAccuracy85.64Unverified
7VK-OODAccuracy84.6Unverified
8SimVLMAccuracy84.53Unverified
9X-VLM (base)Accuracy84.41Unverified
10VK-OODAccuracy83.9Unverified
#ModelMetricClaimedVerifiedStatus
1BEiT-3Accuracy92.58Unverified
2X2-VLM (large)Accuracy89.4Unverified
3XFM (base)Accuracy88.4Unverified
4X2-VLM (base)Accuracy87Unverified
5CoCaAccuracy87Unverified
6VLMoAccuracy86.86Unverified
7SimVLMAccuracy85.15Unverified
8X-VLM (base)Accuracy84.76Unverified
9BLIP-129MAccuracy83.09Unverified
10ALBEF (14M)Accuracy82.55Unverified
#ModelMetricClaimedVerifiedStatus
1AI CoreAverage-per ques.95.24Unverified
2redherringAverage-per ques.91.14Unverified
3VRDPAverage-per ques.90.24Unverified
4FightttttAverage-per ques.88.71Unverified
5neuralAverage-per ques.88.27Unverified
6NERVAverage-per ques.88.05Unverified
7DCLAverage-per ques.75.52Unverified
8troublesolverAverage-per ques.73.3Unverified
9v0.1Average-per ques.73.1Unverified
10First_testAverage-per ques.69.65Unverified
#ModelMetricClaimedVerifiedStatus
1Gemini-2.0 + CA2-Class Accuracy93.6Unverified
2GPT-4o + CA2-Class Accuracy92.8Unverified
3Human2-Class Accuracy91Unverified
4SNAIL2-Class Accuracy64Unverified
5InstructBLIP + GPT-42-Class Accuracy63.8Unverified
6BLIP-2 + ChatGPT (Fine-tuned)2-Class Accuracy63.3Unverified
7InstructBLIP + ChatGPT + Neuro-Symbolic2-Class Accuracy55.5Unverified
8ChatCaptioner + ChatGPT2-Class Accuracy49.3Unverified
9Otter2-Class Accuracy49.3Unverified
#ModelMetricClaimedVerifiedStatus
1HumansJaccard Index90Unverified
2ViLT (Zero-Shot)Jaccard Index52Unverified
3X-VLM (Zero-Shot)Jaccard Index46Unverified
4CLIP-ViT-B/32 (Zero-Shot)Jaccard Index41Unverified
5CLIP-ViT-L/14 (Zero-Shot)Jaccard Index40Unverified
6CLIP-RN50x64/14 (Zero-Shot)Jaccard Index38Unverified
7CLIP-RN50 (Zero-Shot)Jaccard Index35Unverified
8CLIP-ViL (Zero-Shot)Jaccard Index15Unverified
#ModelMetricClaimedVerifiedStatus
1LXMERTaccuracy70.1Unverified
2ViLTaccuracy69.3Unverified
3CLIP (finetuned)accuracy65.1Unverified
4CLIP (frozen)accuracy56Unverified
5VisualBERTaccuracy55.2Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS42.2Unverified
2Dec[Joint]1fAUCCESS40.3Unverified
3Dynamics-Aware DQNAUCCESS39.9Unverified
4DQNAUCCESS36.8Unverified
#ModelMetricClaimedVerifiedStatus
1RPINAUCCESS85.2Unverified
2Dynamics-Aware DQNAUCCESS85.2Unverified
3Dec[Joint]1fAUCCESS80Unverified
4DQNAUCCESS77.6Unverified
#ModelMetricClaimedVerifiedStatus
1Swin1:1 Accuracy52.9Unverified
2ConvNeXt1:1 Accuracy51.2Unverified
3ViT1:1 Accuracy50.3Unverified
4DEiT1:1 Accuracy47.2Unverified
#ModelMetricClaimedVerifiedStatus
1Humans1-of-100 Accuracy100Unverified
#ModelMetricClaimedVerifiedStatus
1VisualBERTAccuracy (Dev)67.4Unverified