| Progressive Multi-granular Alignments for Grounded Reasoning in Large Vision-Language Models | Dec 11, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Prompting Large Vision-Language Models for Compositional Reasoning | Jan 20, 2024 | RetrievalVisual Reasoning | CodeCode Available | 0 | 5 |
| QLEVR: A Diagnostic Dataset for Quantificational Language and Elementary Visual Reasoning | May 6, 2022 | DiagnosticQuestion Answering | CodeCode Available | 0 | 5 |
| Raven's Progressive Matrices Completion with Latent Gaussian Process Priors | Mar 22, 2021 | Answer SelectionGaussian Processes | CodeCode Available | 0 | 5 |
| Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning | Mar 1, 2024 | DisentanglementInformativeness | CodeCode Available | 0 | 5 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 | 5 |
| SAViR-T: Spatially Attentive Visual Reasoning with Transformers | Jun 18, 2022 | Inductive BiasVisual Reasoning | CodeCode Available | 0 | 5 |
| Slot Abstractors: Toward Scalable Abstract Visual Reasoning | Mar 6, 2024 | ObjectSystematic Generalization | CodeCode Available | 0 | 5 |
| Smart Home Appliances: Chat with Your Fridge | Dec 19, 2019 | Dataset GenerationVisual Reasoning | CodeCode Available | 0 | 5 |
| Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Jan 6, 2025 | HallucinationMultimodal Reasoning | CodeCode Available | 0 | 5 |
| Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method | Nov 14, 2023 | ARCDimensionality Reduction | CodeCode Available | 0 | 5 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 | 5 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 | 5 |
| Systematic Visual Reasoning through Object-Centric Relational Abstraction | Jun 4, 2023 | ObjectSystematic Generalization | CodeCode Available | 0 | 5 |
| TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images | Apr 1, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 | 5 |
| Techniques for Symbol Grounding with SATNet | Jun 16, 2021 | Logical ReasoningVisual Reasoning | CodeCode Available | 0 | 5 |
| Temporal Reasoning via Audio Question Answering | Nov 21, 2019 | Audio Question AnsweringDiagnostic | CodeCode Available | 0 | 5 |
| TGraphX: Tensor-Aware Graph Neural Network for Multi-Dimensional Feature Learning | Apr 4, 2025 | Graph Neural Networkobject-detection | CodeCode Available | 0 | 5 |
| The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning | Feb 10, 2022 | DiagnosticVisual Abductive Reasoning | CodeCode Available | 0 | 5 |
| Five Points to Check when Comparing Visual Perception in Humans and Machines | Apr 20, 2020 | Decision MakingObject Recognition | CodeCode Available | 0 | 5 |
| Thinking with Generated Images | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 | 5 |
| Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks | Jan 12, 2023 | Cross-Modal RetrievalOpen-Ended Question Answering | CodeCode Available | 0 | 5 |
| Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge | Jan 1, 2023 | Decision MakingQuestion Answering | CodeCode Available | 0 | 5 |
| UniT: Multimodal Multitask Learning with a Unified Transformer | Feb 22, 2021 | DecoderMultimodal Reasoning | CodeCode Available | 0 | 5 |
| Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning | Mar 14, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Unicode Analogies: An Anti-Objectivist Visual Reasoning Challenge | Jan 1, 2023 | NavigateVisual Reasoning | CodeCode Available | 0 | 5 |
| OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework | Feb 7, 2022 | Image Captioningimage-classification | CodeCode Available | 0 | 5 |
| Unraveling the geometry of visual relational reasoning | Feb 24, 2025 | Relational ReasoningRelation Network | CodeCode Available | 0 | 5 |
| VASR: Visual Analogies of Situation Recognition | Dec 8, 2022 | Common Sense ReasoningTriplet | CodeCode Available | 0 | 5 |
| VDebugger: Harnessing Execution Feedback for Debugging Visual Programs | Jun 19, 2024 | Visual Reasoning | CodeCode Available | 0 | 5 |
| ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese | Oct 27, 2023 | Information RetrievalNatural Language Queries | CodeCode Available | 0 | 5 |
| ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling | Feb 9, 2024 | HallucinationNatural Language Understanding | CodeCode Available | 0 | 5 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 | 5 |
| VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives | Jun 22, 2022 | Feature ImportanceQuestion Answering | CodeCode Available | 0 | 5 |
| Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning | May 1, 2018 | Commonsense Causal ReasoningImage Captioning | CodeCode Available | 0 | 5 |
| Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset | Nov 21, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 | 5 |
| Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests | Dec 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 | 5 |
| Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Jun 26, 2024 | In-Context LearningTraveling Salesman Problem | CodeCode Available | 0 | 5 |
| Visual Reasoning by Progressive Module Networks | Jun 6, 2018 | Visual Reasoning | CodeCode Available | 0 | 5 |
| Visual Reasoning in Object-Centric Deep Neural Networks: A Comparative Cognition Approach | Feb 20, 2024 | ObjectRelational Reasoning | CodeCode Available | 0 | 5 |
| Visual Reasoning with Multi-hop Feature Modulation | Aug 3, 2018 | Question AnsweringVisual Dialog | CodeCode Available | 0 | 5 |
| Visual Transformation Telling | May 3, 2023 | Dense Video CaptioningVideo Captioning | CodeCode Available | 0 | 5 |
| V-LoL: A Diagnostic Dataset for Visual Logical Learning | Jun 13, 2023 | DiagnosticLogical Reasoning | CodeCode Available | 0 | 5 |
| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 | 5 |
| VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Mar 21, 2024 | Pose EstimationVideo Understanding | CodeCode Available | 0 | 5 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 | 5 |
| Weakly-supervised Semantic Parsing with Abstract Examples | Nov 14, 2017 | Semantic ParsingVisual Reasoning | CodeCode Available | 0 | 5 |
| What Is Missing in Multilingual Visual Reasoning and How to Fix It | Mar 3, 2024 | Image CaptioningVisual Reasoning | CodeCode Available | 0 | 5 |
| What is the Visual Cognition Gap between Humans and Multimodal LLMs? | Jun 14, 2024 | object-detectionObject Detection | CodeCode Available | 0 | 5 |
| When Causal Intervention Meets Adversarial Examples and Image Masking for Deep Neural Networks | Feb 9, 2019 | Causal InferenceVisual Reasoning | CodeCode Available | 0 | 5 |