| CLEVR-Ref+: Diagnosing Visual Reasoning with Referring Expressions | Jan 3, 2019 | DiagnosticImage Segmentation | CodeCode Available | 0 | 5 |
| CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes | Sep 19, 2020 | Graph Neural NetworkVisual Reasoning | CodeCode Available | 0 | 5 |
| CLEVRER: CoLlision Events for Video REpresentation and Reasoning | Oct 3, 2019 | counterfactualDescriptive | CodeCode Available | 0 | 5 |
| Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method | Nov 14, 2023 | ARCDimensionality Reduction | CodeCode Available | 0 | 5 |
| Smart Home Appliances: Chat with Your Fridge | Dec 19, 2019 | Dataset GenerationVisual Reasoning | CodeCode Available | 0 | 5 |
| Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Jan 6, 2025 | HallucinationMultimodal Reasoning | CodeCode Available | 0 | 5 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 | 5 |
| A Distance-preserving Matrix Sketch | Sep 8, 2020 | Clusteringfeature selection | CodeCode Available | 0 | 5 |
| Slot Abstractors: Toward Scalable Abstract Visual Reasoning | Mar 6, 2024 | ObjectSystematic Generalization | CodeCode Available | 0 | 5 |
| FigureQA: An Annotated Figure Dataset for Visual Reasoning | Oct 19, 2017 | BIG-bench Machine LearningChart Question Answering | CodeCode Available | 0 | 5 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 | 5 |
| A Dataset and Architecture for Visual Reasoning with a Working Memory | Mar 16, 2018 | DiagnosticLogical Reasoning | CodeCode Available | 0 | 5 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| SAViR-T: Spatially Attentive Visual Reasoning with Transformers | Jun 18, 2022 | Inductive BiasVisual Reasoning | CodeCode Available | 0 | 5 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 | 5 |
| Explainable and Explicit Visual Reasoning over Scene Graphs | Dec 5, 2018 | Inductive BiasVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning | Mar 1, 2024 | DisentanglementInformativeness | CodeCode Available | 0 | 5 |
| Raven's Progressive Matrices Completion with Latent Gaussian Process Priors | Mar 22, 2021 | Answer SelectionGaussian Processes | CodeCode Available | 0 | 5 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 | 5 |
| Predicting Complete 3D Models of Indoor Scenes | Apr 9, 2015 | DiversityVisual Reasoning | CodeCode Available | 0 | 5 |
| Program synthesis performance constrained by non-linear spatial relations in Synthetic Visual Reasoning Test | Nov 18, 2019 | Few-Shot LearningProgram Synthesis | CodeCode Available | 0 | 5 |
| A Plug-and-Play Method for Rare Human-Object Interactions Detection by Bridging Domain Gap | Jul 31, 2024 | Human-Object Interaction DetectionImage Reconstruction | CodeCode Available | 0 | 5 |
| Enforcing Consistency in Weakly Supervised Semantic Parsing | Jul 13, 2021 | Semantic ParsingVisual Reasoning | CodeCode Available | 0 | 5 |
| Physical Reasoning Using Dynamics-Aware Models | Feb 20, 2021 | Visual Reasoning | CodeCode Available | 0 | 5 |
| Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models | Dec 11, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning | Jul 9, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 | 5 |
| Answer Questions with Right Image Regions: A Visual Attention Regularization Approach | Feb 3, 2021 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| One Self-Configurable Model to Solve Many Abstract Visual Reasoning Problems | Dec 15, 2023 | Odd One OutTransfer Learning | CodeCode Available | 0 | 5 |
| On Erroneous Agreements of CLIP Image Embeddings | Nov 7, 2024 | Visual Reasoning | CodeCode Available | 0 | 5 |
| Prompting Large Vision-Language Models for Compositional Reasoning | Jan 20, 2024 | RetrievalVisual Reasoning | CodeCode Available | 0 | 5 |
| Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation | Mar 10, 2023 | Image Generationmultimodal generation | CodeCode Available | 0 | 5 |
| Object Level Visual Reasoning in Videos | Jun 16, 2018 | Activity RecognitionHuman Activity Recognition | CodeCode Available | 0 | 5 |
| OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning | May 22, 2025 | Optical Character Recognition (OCR)Visual Reasoning | CodeCode Available | 0 | 5 |
| Bottom-Up Shift and Reasoning for Referring Image Segmentation | Jun 19, 2021 | Image SegmentationSegmentation | CodeCode Available | 0 | 5 |
| Multi-Modal Dialogue State Tracking for Playing GuessWhich Game | Aug 15, 2024 | Dialogue State TrackingVisual Reasoning | CodeCode Available | 0 | 5 |
| Multi-Label Contrastive Learning for Abstract Visual Reasoning | Dec 3, 2020 | Contrastive LearningData Augmentation | CodeCode Available | 0 | 5 |
| Multi-Label Zero-Shot Learning with Structured Knowledge Graphs | Nov 17, 2017 | General ClassificationKnowledge Graphs | CodeCode Available | 0 | 5 |
| MM-PoE: Multiple Choice Reasoning via. Process of Elimination using Multi-Modal Models | Dec 10, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 0 | 5 |
| Multilevel Hierarchical Network with Multiscale Sampling for Video Question Answering | May 9, 2022 | multimodal interactionQuestion Answering | CodeCode Available | 0 | 5 |
| Odd-One-Out Representation Learning | Dec 14, 2020 | DisentanglementMetric Learning | CodeCode Available | 0 | 5 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 | 5 |
| A Corpus for Reasoning About Natural Language Grounded in Photographs | Nov 1, 2018 | DiversityVisual Reasoning | CodeCode Available | 0 | 5 |
| Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Oct 25, 2024 | Visual Reasoning | CodeCode Available | 0 | 5 |
| Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning | Sep 30, 2024 | Visual Reasoning | CodeCode Available | 0 | 5 |
| KnowZRel: Common Sense Knowledge-based Zero-Shot Relationship Retrieval for Generalised Scene Graph Generation | Feb 21, 2025 | Common Sense ReasoningGraph Generation | CodeCode Available | 0 | 5 |
| Deconfounded Visual Grounding | Dec 31, 2021 | Referring ExpressionVisual Grounding | CodeCode Available | 0 | 5 |
| MARVEL: Multidimensional Abstraction and Reasoning through Visual Evaluation and Learning | Apr 21, 2024 | Visual Reasoning | CodeCode Available | 0 | 5 |
| MCTBench: Multimodal Cognition towards Text-Rich Visual Scenes Benchmark | Oct 15, 2024 | FairnessScene Text Recognition | CodeCode Available | 0 | 5 |
| 'Just because you are right, doesn't mean I am wrong': Overcoming a Bottleneck in the Development and Evaluation of Open-Ended Visual Question Answering (VQA) Tasks | Mar 28, 2021 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |