| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 |
| Smart Home Appliances: Chat with Your Fridge | Dec 19, 2019 | Dataset GenerationVisual Reasoning | CodeCode Available | 0 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 |
| HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models | Dec 29, 2024 | HallucinationObject | CodeCode Available | 0 |
| Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning | May 1, 2018 | Commonsense Causal ReasoningImage Captioning | CodeCode Available | 0 |
| Socratic Questioning: Learn to Self-guide Multimodal Reasoning in the Wild | Jan 6, 2025 | HallucinationMultimodal Reasoning | CodeCode Available | 0 |
| Solving ARC visual analogies with neural embeddings and vector arithmetic: A generalized method | Nov 14, 2023 | ARCDimensionality Reduction | CodeCode Available | 0 |
| GAMR: A Guided Attention Model for (visual) Reasoning | Jun 10, 2022 | modelVisual Reasoning | CodeCode Available | 0 |
| Mapping Natural Language Commands to Web Elements | Aug 28, 2018 | Relational ReasoningVisual Reasoning | CodeCode Available | 0 |
| LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression | Mar 6, 2025 | BenchmarkingCommon Sense Reasoning | CodeCode Available | 0 |
| Cascaded Mutual Modulation for Visual Reasoning | Sep 6, 2018 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 |
| Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages | Jun 29, 2023 | Image-text RetrievalMachine Translation | CodeCode Available | 0 |
| Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI | May 9, 2025 | 4kDomain Generalization | CodeCode Available | 0 |
| Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding | Jun 9, 2023 | Few-Shot Learningimage-classification | CodeCode Available | 0 |
| Learning Visual Reasoning Without Strong Priors | Jul 10, 2017 | Visual Reasoning | CodeCode Available | 0 |
| Visual Contexts Clarify Ambiguous Expressions: A Benchmark Dataset | Nov 21, 2024 | Question AnsweringVisual Grounding | CodeCode Available | 0 |
| VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Mar 21, 2024 | Pose EstimationVideo Understanding | CodeCode Available | 0 |
| GPT4SGG: Synthesizing Scene Graphs from Holistic and Region-specific Narratives | Dec 7, 2023 | Graph GenerationLanguage Modelling | CodeCode Available | 0 |
| Weakly Supervised Relative Spatial Reasoning for Visual Question Answering | Sep 4, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| Systematic Visual Reasoning through Object-Centric Relational Abstraction | Jun 4, 2023 | ObjectSystematic Generalization | CodeCode Available | 0 |
| Learning Visual Abstract Reasoning through Dual-Stream Networks | Nov 29, 2024 | Visual Reasoning | CodeCode Available | 0 |
| FigureQA: An Annotated Figure Dataset for Visual Reasoning | Oct 19, 2017 | BIG-bench Machine LearningChart Question Answering | CodeCode Available | 0 |
| TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images | Apr 1, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| Techniques for Symbol Grounding with SATNet | Jun 16, 2021 | Logical ReasoningVisual Reasoning | CodeCode Available | 0 |
| Temporal Reasoning via Audio Question Answering | Nov 21, 2019 | Audio Question AnsweringDiagnostic | CodeCode Available | 0 |
| A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering | Oct 1, 2022 | Medical Visual Question AnsweringQuestion Answering | CodeCode Available | 0 |
| Learning to reason over visual objects | Mar 3, 2023 | Inductive BiasVisual Reasoning | CodeCode Available | 0 |
| Explainable and Explicit Visual Reasoning over Scene Graphs | Dec 5, 2018 | Inductive BiasVisual Question Answering (VQA) | CodeCode Available | 0 |
| TGraphX: Tensor-Aware Graph Neural Network for Multi-Dimensional Feature Learning | Apr 4, 2025 | Graph Neural Networkobject-detection | CodeCode Available | 0 |
| The Abduction of Sherlock Holmes: A Dataset for Visual Abductive Reasoning | Feb 10, 2022 | DiagnosticVisual Abductive Reasoning | CodeCode Available | 0 |
| Weakly-supervised Semantic Parsing with Abstract Examples | Nov 14, 2017 | Semantic ParsingVisual Reasoning | CodeCode Available | 0 |
| Five Points to Check when Comparing Visual Perception in Humans and Machines | Apr 20, 2020 | Decision MakingObject Recognition | CodeCode Available | 0 |
| Enforcing Consistency in Weakly Supervised Semantic Parsing | Jul 13, 2021 | Semantic ParsingVisual Reasoning | CodeCode Available | 0 |
| Bottom-Up Shift and Reasoning for Referring Image Segmentation | Jun 19, 2021 | Image SegmentationSegmentation | CodeCode Available | 0 |
| Thinking with Generated Images | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 |
| Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis | Feb 11, 2023 | Image-text RetrievalKnowledge Graphs | CodeCode Available | 0 |
| Learning to Compose: Improving Object Centric Learning by Injecting Compositionality | May 1, 2024 | ObjectSystematic Generalization | CodeCode Available | 0 |
| Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning | Oct 4, 2022 | Image CaptioningSentence | CodeCode Available | 0 |
| Learning logic programs by discovering higher-order abstractions | Aug 16, 2023 | Inductive logic programmingProgram Synthesis | CodeCode Available | 0 |
| Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks | Jan 12, 2023 | Cross-Modal RetrievalOpen-Ended Question Answering | CodeCode Available | 0 |
| Toward Multi-Granularity Decision-Making: Explicit Visual Reasoning with Hierarchical Knowledge | Jan 1, 2023 | Decision MakingQuestion Answering | CodeCode Available | 0 |
| Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests | Dec 3, 2022 | Question AnsweringVisual Question Answering | CodeCode Available | 0 |
| Differentiable Scene Graphs | Feb 26, 2019 | Visual Reasoning | CodeCode Available | 0 |
| Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad? | Oct 25, 2024 | Visual Reasoning | CodeCode Available | 0 |
| Visual Reasoning and Multi-Agent Approach in Multimodal Large Language Models (MLLMs): Solving TSP and mTSP Combinatorial Challenges | Jun 26, 2024 | In-Context LearningTraveling Salesman Problem | CodeCode Available | 0 |
| Visual Reasoning by Progressive Module Networks | Jun 6, 2018 | Visual Reasoning | CodeCode Available | 0 |
| A Corpus for Reasoning About Natural Language Grounded in Photographs | Nov 1, 2018 | DiversityVisual Reasoning | CodeCode Available | 0 |