| Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Apr 30, 2024 | 3D visual groundingVisual Grounding | —Unverified | 0 | 0 |
| Navigating to Objects Specified by Images | Apr 3, 2023 | NavigateVisual Reasoning | —Unverified | 0 | 0 |
| Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language | Oct 28, 2021 | counterfactualVisual Reasoning | —Unverified | 0 | 0 |
| Neural-guided, Bidirectional Program Search for Abstraction and Reasoning | Oct 22, 2021 | ARCProgram Synthesis | —Unverified | 0 | 0 |
| Dynamic Graph Attention for Referring Expression Comprehension | Sep 18, 2019 | Graph AttentionReferring Expression | —Unverified | 0 | 0 |
| Neural Structure Mapping For Learning Abstract Visual Analogies | Oct 12, 2021 | Visual AnalogiesVisual Reasoning | —Unverified | 0 | 0 |
| DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning | Mar 25, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation | Mar 21, 2025 | Dataset GenerationGraph Generation | —Unverified | 0 | 0 |
| Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" | Jun 20, 2020 | Graph GenerationQuestion Answering | —Unverified | 0 | 0 |
| Dual Local-Global Contextual Pathways for Recognition in Aerial Imagery | May 18, 2016 | Object RecognitionRoad Segmentation | —Unverified | 0 | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Jul 11, 2024 | Domain GeneralizationHuman-Object Interaction Detection | —Unverified | 0 | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 | 0 |
| NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Apr 28, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 | 0 |
| Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks | Jan 1, 2018 | MemorizationQuestion Answering | —Unverified | 0 | 0 |
| Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models | Apr 27, 2025 | Visual ReasoningWorld Knowledge | —Unverified | 0 | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 | 0 |
| Attention over learned object embeddings enables complex visual reasoning | Dec 15, 2020 | ObjectVideo Object Tracking | —Unverified | 0 | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 | 0 |
| Advancing Object Detection in Transportation with Multimodal Large Language Models (MLLMs): A Comprehensive Review and Empirical Testing | Sep 26, 2024 | Event DetectionObject | —Unverified | 0 | 0 |
| Object Ordering with Bidirectional Matchings for Visual Reasoning | Apr 18, 2018 | ObjectVisual Reasoning | —Unverified | 0 | 0 |
| OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning | Oct 28, 2023 | Data AugmentationOut-of-Distribution Generalization | —Unverified | 0 | 0 |
| 3D Concept Learning and Reasoning from Multi-View Images | Mar 20, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Visual Question Answering in the Medical Domain | Sep 20, 2023 | Contrastive LearningMedical Visual Question Answering | —Unverified | 0 | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 | 0 |
| Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models | Feb 17, 2025 | Instruction Followingvisual instruction following | —Unverified | 0 | 0 |
| On Data Synthesis and Post-training for Visual Abstract Reasoning | Apr 2, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 | 0 |
| One RL to See Them All: Visual Triple Unified Reinforcement Learning | May 23, 2025 | AllMath | —Unverified | 0 | 0 |
| 3D Concept Grounding on Neural Fields | Jul 13, 2022 | Instance SegmentationQuestion Answering | —Unverified | 0 | 0 |
| Zero-Shot Visual Reasoning by Vision-Language Models: Benchmarking and Analysis | Aug 27, 2024 | BenchmarkingLarge Language Model | —Unverified | 0 | 0 |
| Two-stage Rule-induction Visual Reasoning on RPMs with an Application to Video Prediction | Nov 24, 2021 | Logical ReasoningVideo Prediction | —Unverified | 0 | 0 |
| On the Potential of CLIP for Compositional Logical Reasoning | Aug 30, 2023 | Logical ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR | May 27, 2024 | Question AnsweringTAG | —Unverified | 0 | 0 |
| Does Visual Pretraining Help End-to-End Reasoning? | Jul 17, 2023 | image-classificationImage Classification | —Unverified | 0 | 0 |
| Open Set Video HOI detection from Action-Centric Chain-of-Look Prompting | Jan 1, 2023 | Human-Object Interaction DetectionLanguage Modelling | —Unverified | 0 | 0 |
| Does Structural Attention Improve Compositional Representations in Vision-Language Models? | Dec 3, 2022 | Visual Reasoning | —Unverified | 0 | 0 |
| Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Jul 7, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 | 0 |
| Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting | Oct 28, 2023 | RelationVisual Reasoning | —Unverified | 0 | 0 |
| Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols | Jul 18, 2024 | Visual Reasoning | —Unverified | 0 | 0 |
| Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT | Feb 23, 2025 | Bias DetectionVisual Reasoning | —Unverified | 0 | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 | 0 |
| PaLI: A Jointly-Scaled Multilingual Language-Image Model | Sep 14, 2022 | DecoderFew-Shot Image Classification | —Unverified | 0 | 0 |
| Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning | May 24, 2025 | document understandingVisual Reasoning | —Unverified | 0 | 0 |
| Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA | Jun 27, 2024 | General KnowledgeQuestion Answering | —Unverified | 0 | 0 |
| Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image | Jun 9, 2020 | Motion PlanningTask and Motion Planning | —Unverified | 0 | 0 |
| Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | Dec 4, 2024 | Depth Estimationobject-detection | —Unverified | 0 | 0 |
| PhD Thesis: Exploring the role of (self-)attention in cognitive and computer vision architecture | Jun 26, 2023 | Visual ReasoningZero-shot Generalization | —Unverified | 0 | 0 |
| Deep Reason: A Strong Baseline for Real-World Visual Reasoning | May 24, 2019 | Visual Reasoning | —Unverified | 0 | 0 |
| Advancing Generalization Across a Variety of Abstract Visual Reasoning Tasks | May 19, 2025 | Visual Reasoning | —Unverified | 0 | 0 |