| VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Mar 14, 2025 | BenchmarkingDecision Making | —Unverified | 0 | 0 |
| Towards Explainable Neural-Symbolic Visual Reasoning | Sep 19, 2019 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | —Unverified | 0 | 0 |
| HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model | Jun 1, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 | 0 |
| HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | Jun 26, 2025 | counterfactualCounterfactual Reasoning | —Unverified | 0 | 0 |
| Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Jun 8, 2025 | AttributeHallucination | —Unverified | 0 | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction | Jul 19, 2024 | 3D ReconstructionSpatial Reasoning | —Unverified | 0 | 0 |
| Abstract Diagrammatic Reasoning with Multiplex Graph Networks | Jun 19, 2020 | Graph Neural NetworkVisual Reasoning | —Unverified | 0 | 0 |
| Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks | Jan 1, 2023 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 | 0 |
| Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models | May 22, 2024 | Multimodal ReasoningVisual Question Answering | —Unverified | 0 | 0 |
| GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Sep 26, 2024 | Autonomous VehiclesMotion Planning | —Unverified | 0 | 0 |
| Video Captioning Using Weak Annotation | Sep 2, 2020 | SentenceVideo Captioning | —Unverified | 0 | 0 |
| Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning | May 26, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 | 0 |
| Impact of ML Optimization Tactics on Greener Pre-Trained ML Models | Sep 19, 2024 | GPUimage-classification | —Unverified | 0 | 0 |
| Compromising Embodied Agents with Contextual Backdoor Attacks | Aug 6, 2024 | Autonomous DrivingRobot Manipulation | —Unverified | 0 | 0 |
| Improving Generalization in Visual Reasoning via Self-Ensemble | Oct 28, 2024 | Visual Question Answering (VQA)Visual Reasoning | —Unverified | 0 | 0 |
| Improving Scene Graph Classification by Exploiting Knowledge from Texts | Feb 9, 2021 | ClassificationGeneral Classification | —Unverified | 0 | 0 |
| Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs | May 10, 2023 | Scene UnderstandingVisual Reasoning | —Unverified | 0 | 0 |
| Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning | Jan 1, 2021 | counterfactualObject | —Unverified | 0 | 0 |
| INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision | Sep 29, 2021 | ObjectVideo Object Tracking | —Unverified | 0 | 0 |
| A Survey on Multimodal Large Language Models | Jun 23, 2023 | HallucinationIn-Context Learning | —Unverified | 0 | 0 |
| Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning | Mar 30, 2021 | counterfactualObject | —Unverified | 0 | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 | 0 |
| Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation | Jan 30, 2025 | MemorizationScene Understanding | —Unverified | 0 | 0 |
| GRIT: Teaching MLLMs to Think with Images | May 21, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 | 0 |
| Graph Representation for Order-Aware Visual Transformation | Jan 1, 2023 | Visual Reasoning | —Unverified | 0 | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| ViLEM: Visual-Language Error Modeling for Image-Text Retrieval | Jan 1, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Grammar-Based Grounded Lexicon Learning | Feb 17, 2022 | Network EmbeddingSentence | —Unverified | 0 | 0 |
| Introduction to Soar | May 8, 2022 | ChunkingDecision Making | —Unverified | 0 | 0 |
| A survey on knowledge-enhanced multimodal learning | Nov 19, 2022 | Conditional Image GenerationFactual Visual Question Answering | —Unverified | 0 | 0 |
| GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs | Mar 30, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts | Jan 1, 2024 | Image GenerationInstruction Following | —Unverified | 0 | 0 |
| Iterative Search for Weakly Supervised Semantic Parsing | Jun 1, 2019 | Semantic ParsingVisual Reasoning | —Unverified | 0 | 0 |
| Iterative Visual Reasoning Beyond Convolutions | Mar 29, 2018 | Visual Reasoning | —Unverified | 0 | 0 |
| It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning | Jun 1, 2019 | Transfer LearningVisual Reasoning | —Unverified | 0 | 0 |
| A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | May 5, 2025 | MathMedical Diagnosis | —Unverified | 0 | 0 |
| Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos | Mar 2, 2023 | Representation LearningSentence | —Unverified | 0 | 0 |
| ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling | Aug 7, 2024 | AttributeLanguage Modeling | —Unverified | 0 | 0 |
| ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning | Aug 5, 2024 | Visual Reasoning | —Unverified | 0 | 0 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Just Say the Name: Online Continual Learning with Category Names Only via Data Generation | Mar 16, 2024 | Continual LearningDiversity | —Unverified | 0 | 0 |
| A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning | Jun 16, 2024 | Transfer LearningVisual Reasoning | —Unverified | 0 | 0 |
| GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | May 29, 2025 | Multimodal ReasoningMVBench | —Unverified | 0 | 0 |
| A Review of Emerging Research Directions in Abstract Visual Reasoning | Feb 21, 2022 | Visual Reasoning | —Unverified | 0 | 0 |
| KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Jun 9, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| Language-Conditioned Robotic Manipulation with Fast and Slow Thinking | Jan 8, 2024 | Decision MakingIntent Recognition | —Unverified | 0 | 0 |
| Language-Guided Salient Object Ranking | Jan 1, 2025 | ObjectSaliency Ranking | —Unverified | 0 | 0 |