| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Prompting Large Vision-Language Models for Compositional Reasoning | Jan 20, 2024 | RetrievalVisual Reasoning | CodeCode Available | 0 |
| Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models | Dec 11, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| Collecting Visually-Grounded Dialogue with A Game Of Sorts | Sep 10, 2023 | Coreference ResolutionImage Retrieval | CodeCode Available | 0 |
| Raven's Progressive Matrices Completion with Latent Gaussian Process Priors | Mar 22, 2021 | Answer SelectionGaussian Processes | CodeCode Available | 0 |
| ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese | Oct 27, 2023 | Information RetrievalNatural Language Queries | CodeCode Available | 0 |
| Enhancing Interpretability and Interactivity in Robot Manipulation: A Neurosymbolic Approach | Oct 3, 2022 | Referring ExpressionRobot Manipulation | CodeCode Available | 0 |
| ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling | Feb 9, 2024 | HallucinationNatural Language Understanding | CodeCode Available | 0 |
| Untrained neural networks can demonstrate memorization-independent abstract reasoning | Jul 25, 2024 | MemorizationVisual Reasoning | CodeCode Available | 0 |
| Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test | Nov 18, 2019 | Few-Shot LearningProgram Synthesis | CodeCode Available | 0 |
| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 |
| Predicting Complete 3D Models of Indoor Scenes | Apr 9, 2015 | DiversityVisual Reasoning | CodeCode Available | 0 |
| Physical Reasoning Using Dynamics-Aware Models | Feb 20, 2021 | Visual Reasoning | CodeCode Available | 0 |
| Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning | Jul 9, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 |
| One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems | Dec 15, 2023 | Odd One OutTransfer Learning | CodeCode Available | 0 |
| Interpretable Visual Reasoning via Induced Symbolic Space | Nov 23, 2020 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 0 |
| Inferring and Executing Programs for Visual Reasoning | May 10, 2017 | Visual Question Answering (VQA)Visual Reasoning | CodeCode Available | 0 |
| On Erroneous Agreements of CLIP Image Embeddings | Nov 7, 2024 | Visual Reasoning | CodeCode Available | 0 |
| Implicit Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Sep 21, 2023 | Cross-Modal RetrievalImage Captioning | CodeCode Available | 0 |
| Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning | Mar 1, 2024 | DisentanglementInformativeness | CodeCode Available | 0 |
| ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models | May 19, 2025 | Visual Reasoning | CodeCode Available | 0 |
| Odd-One-Out Representation Learning | Dec 14, 2020 | DisentanglementMetric Learning | CodeCode Available | 0 |
| Image Safeguarding: Reasoning with Conditional Vision Language Model and Obfuscating Unsafe Content Counterfactually | Jan 19, 2024 | counterfactualCounterfactual Explanation | CodeCode Available | 0 |
| When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks | Feb 9, 2019 | Causal InferenceVisual Reasoning | CodeCode Available | 0 |
| OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning | May 22, 2025 | Optical Character Recognition (OCR)Visual Reasoning | CodeCode Available | 0 |
| Object Level Visual Reasoning in Videos | Jun 16, 2018 | Activity RecognitionHuman Activity Recognition | CodeCode Available | 0 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 |
| V-LoL: A Diagnostic Dataset for Visual Logical Learning | Jun 13, 2023 | DiagnosticLogical Reasoning | CodeCode Available | 0 |
| SAViR-T: Spatially Attentive Visual Reasoning with Transformers | Jun 18, 2022 | Inductive BiasVisual Reasoning | CodeCode Available | 0 |
| CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes | Sep 19, 2020 | Graph Neural NetworkVisual Reasoning | CodeCode Available | 0 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation | Mar 10, 2023 | Image Generationmultimodal generation | CodeCode Available | 0 |
| Multi-Modal Dialogue State Tracking for Playing GuessWhich Game | Aug 15, 2024 | Dialogue State TrackingVisual Reasoning | CodeCode Available | 0 |
| VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives | Jun 22, 2022 | Feature ImportanceQuestion Answering | CodeCode Available | 0 |
| Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering | May 9, 2022 | multimodal interactionQuestion Answering | CodeCode Available | 0 |
| Multi-Label Zero-Shot Learning with Structured Knowledge Graphs | Nov 17, 2017 | General ClassificationKnowledge Graphs | CodeCode Available | 0 |
| Multi-Label Contrastive Learning for Abstract Visual Reasoning | Dec 3, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 0 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 |
| Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning | Sep 30, 2024 | Visual Reasoning | CodeCode Available | 0 |
| CLEVRER: CoLlision Events for Video REpresentation and Reasoning | Oct 3, 2019 | counterfactualDescriptive | CodeCode Available | 0 |
| Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks | Aug 22, 2022 | AllCross-Modal Retrieval | CodeCode Available | 0 |
| Meta Module Network for Compositional Visual Reasoning | Oct 8, 2019 | MORPHVisual Reasoning | CodeCode Available | 0 |
| A Distance-preserving Matrix Sketch | Sep 8, 2020 | Clusteringfeature selection | CodeCode Available | 0 |
| How to Determine the Preferred Image Distribution of a Black-Box Vision-Language Model? | Sep 3, 2024 | In-Context LearningLanguage Modeling | CodeCode Available | 0 |
| WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models | Jul 25, 2022 | Common Sense ReasoningGeneral Knowledge | CodeCode Available | 0 |
| How a General-Purpose Commonsense Ontology can Improve Performance of Learning-Based Image Retrieval | May 24, 2017 | Image RetrievalRetrieval | CodeCode Available | 0 |
| MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Oct 15, 2024 | FairnessScene Text Recognition | CodeCode Available | 0 |
| Slot Abstractors: Toward Scalable Abstract Visual Reasoning | Mar 6, 2024 | ObjectSystematic Generalization | CodeCode Available | 0 |
| MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning | Apr 21, 2024 | Visual Reasoning | CodeCode Available | 0 |