| Modelling Working Memory using Deep Recurrent Reinforcement Learning | Sep 11, 2019 | Decision Makingreinforcement-learning | —Unverified | 0 |
| Modularity Matters: Learning Invariant Relational Reasoning Tasks | Jun 18, 2018 | Mixture-of-ExpertsRelational Reasoning | —Unverified | 0 |
| Modulated Self-attention Convolutional Network for VQA | Oct 8, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA | Jan 29, 2024 | BenchmarkingImage Comprehension | —Unverified | 0 |
| Multi-Granularity Modularized Network for Abstract Visual Reasoning | Jul 9, 2020 | Visual GroundingVisual Reasoning | —Unverified | 0 |
| Multimodal Representations for Teacher-Guided Compositional Visual Reasoning | Oct 24, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Superpixel Semantics Representation and Pre-training for Vision-Language Task | Oct 20, 2023 | Self-Supervised LearningSuperpixels | —Unverified | 0 |
| Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Apr 30, 2024 | 3D visual groundingVisual Grounding | —Unverified | 0 |
| Navigating to Objects Specified by Images | Apr 3, 2023 | NavigateVisual Reasoning | —Unverified | 0 |
| Neural-guided, Bidirectional Program Search for Abstraction and Reasoning | Oct 22, 2021 | ARCProgram Synthesis | —Unverified | 0 |
| Neural Structure Mapping For Learning Abstract Visual Analogies | Oct 12, 2021 | Visual AnalogiesVisual Reasoning | —Unverified | 0 |
| Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation | Mar 21, 2025 | Dataset GenerationGraph Generation | —Unverified | 0 |
| Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" | Jun 20, 2020 | Graph GenerationQuestion Answering | —Unverified | 0 |
| NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning | Jul 11, 2024 | Domain GeneralizationHuman-Object Interaction Detection | —Unverified | 0 |
| NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Apr 28, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 |
| Not-So-CLEVR: Visual Relations Strain Feedforward Neural Networks | Jan 1, 2018 | MemorizationQuestion Answering | —Unverified | 0 |
| NTSEBENCH: Cognitive Reasoning Benchmark for Vision Language Models | Jul 15, 2024 | Common Sense ReasoningMultiple-choice | —Unverified | 0 |
| Attention over learned object embeddings enables complex visual reasoning | Dec 15, 2020 | ObjectVideo Object Tracking | —Unverified | 0 |
| Object-Centric Diagnosis of Visual Reasoning | Dec 21, 2020 | DiagnosticObject | —Unverified | 0 |
| Object Ordering with Bidirectional Matchings for Visual Reasoning | Apr 18, 2018 | ObjectVisual Reasoning | —Unverified | 0 |
| OC-NMN: Object-centric Compositional Neural Module Network for Generative Visual Analogical Reasoning | Oct 28, 2023 | Data AugmentationOut-of-Distribution Generalization | —Unverified | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 |
| On Data Synthesis and Post-training for Visual Abstract Reasoning | Apr 2, 2025 | Visual Reasoning | —Unverified | 0 |
| One for All: One-stage Referring Expression Comprehension with Dynamic Reasoning | Jul 31, 2022 | AllReferring Expression | —Unverified | 0 |
| Question Guided Modular Routing Networks for Visual Question Answering | Apr 17, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| RAVEN: A Dataset for Relational and Analogical Visual rEasoNing | Mar 7, 2019 | Object RecognitionQuestion Answering | —Unverified | 0 |
| RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs | May 22, 2025 | Image ManipulationMath | —Unverified | 0 |
| Reason from Context with Self-supervised Learning | Nov 23, 2022 | ObjectObject Recognition | —Unverified | 0 |
| Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems | Nov 2, 2024 | SpecificityVisual Reasoning | —Unverified | 0 |
| Reasoning over Vision and Language: Exploring the Benefits of Supplemental Knowledge | Jan 15, 2021 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 |
| Recurrent Vision Transformer for Solving Visual Reasoning Problems | Nov 29, 2021 | Object DetectionVisual Reasoning | —Unverified | 0 |
| Replace-then-Perturb: Targeted Adversarial Attacks With Visual Reasoning for Vision-Language Models | Nov 1, 2024 | Adversarial AttackContrastive Learning | —Unverified | 0 |
| Rethinking Bottlenecks in Safety Fine-Tuning of Vision Language Models | Jan 30, 2025 | Instruction FollowingVisual Reasoning | —Unverified | 0 |
| Retrieving and Highlighting Action with Spatiotemporal Reference | May 19, 2020 | Action RecognitionCross-Modal Retrieval | —Unverified | 0 |
| Revisiting MLLMs: An In-Depth Analysis of Image Classification Abilities | Dec 21, 2024 | AttributeClassification | —Unverified | 0 |
| RGB-Th-Bench: A Dense benchmark for Visual-Thermal Understanding of Vision Language Models | Mar 25, 2025 | Image ComprehensionVisual Reasoning | —Unverified | 0 |
| Robust Visual Reasoning via Language Guided Neural Module Networks | Dec 1, 2021 | Question AnsweringReferring Expression | —Unverified | 0 |
| Same-different problems strain convolutional neural networks | Feb 9, 2018 | MemorizationVisual Reasoning | —Unverified | 0 |
| SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems | Mar 13, 2025 | Visual Reasoning | —Unverified | 0 |
| Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models | May 26, 2025 | Uncertainty QuantificationVisual Reasoning | —Unverified | 0 |
| Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories | Aug 21, 2023 | ClassificationClustering | —Unverified | 0 |
| SelfEval: Leveraging the discriminative nature of generative models for evaluation | Nov 17, 2023 | AttributeVisual Reasoning | —Unverified | 0 |
| Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering | Jun 25, 2020 | DiversityQuestion Answering | —Unverified | 0 |
| Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Feb 24, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| SHOP-VRB: A Visual Reasoning Benchmark for Object Perception | Apr 6, 2020 | ObjectVisual Reasoning | —Unverified | 0 |
| Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study | Mar 9, 2025 | QuantizationToken Reduction | —Unverified | 0 |
| Simple Token-Level Confidence Improves Caption Correctness | May 11, 2023 | HallucinationImage Captioning | —Unverified | 0 |
| Slow Perception: Let's Perceive Geometric Figures Step-by-step | Dec 30, 2024 | MathVisual Reasoning | —Unverified | 0 |
| SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Mar 5, 2024 | Concept AlignmentExplanation Generation | —Unverified | 0 |
| Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence | Jun 1, 2019 | Question AnsweringVisual Reasoning | —Unverified | 0 |