| End-to-End Learning of Semantic Grasping | Jul 6, 2017 | Objectobject-detection | —Unverified | 0 |
| Enhancing Advanced Visual Reasoning Ability of Large Language Models | Sep 21, 2024 | In-Context LearningVisual Reasoning | —Unverified | 0 |
| Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models | Nov 27, 2024 | Visual Reasoning | —Unverified | 0 |
| Interactive Visual Reasoning under Uncertainty | Jun 18, 2022 | Visual Reasoning | —Unverified | 0 |
| EuclidNet: Deep Visual Reasoning for Constructible Problems in Geometry | Dec 27, 2022 | Automated Theorem ProvingVisual Reasoning | —Unverified | 0 |
| Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark | Jun 4, 2025 | SentenceVisual Reasoning | —Unverified | 0 |
| Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration | Jun 24, 2024 | DiversityMultiple-choice | —Unverified | 0 |
| Leveraging VLM-Based Pipelines to Annotate 3D Objects | Nov 29, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| EXCLAIM: An Explainable Cross-Modal Agentic System for Misinformation Detection with Hierarchical Retrieval | Mar 1, 2025 | Explanation GenerationMisinformation | —Unverified | 0 |
| Learning to Assemble Neural Module Tree Networks for Visual Grounding | Dec 8, 2018 | Dependency ParsingNatural Language Visual Grounding | —Unverified | 0 |
| Explainable AI And Visual Reasoning: Insights From Radiology | Apr 6, 2023 | DiagnosticExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| Explicit3D: Graph Network with Spatial Inference for Single Image 3D Object Detection | Feb 13, 2023 | 3D Object DetectionGraph Generation | —Unverified | 0 |
| Explicit Knowledge Incorporation for Visual Reasoning | Jun 19, 2021 | Visual Reasoning | —Unverified | 0 |
| Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems | Jun 11, 2024 | In-Context LearningTraveling Salesman Problem | —Unverified | 0 |
| Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation | Mar 23, 2015 | ObjectObject Recognition | —Unverified | 0 |
| Few-Shot Abstract Visual Reasoning With Spectral Features | Oct 4, 2019 | Few-Shot LearningVisual Reasoning | —Unverified | 0 |
| Few-shot Subgoal Planning with Language Models | May 28, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Few-shot Visual Reasoning with Meta-analogical Contrastive Learning | Jul 23, 2020 | Contrastive LearningLogical Reasoning | —Unverified | 0 |
| Filling in the details: Perceiving from low fidelity images | Apr 14, 2016 | FoveationVisual Reasoning | —Unverified | 0 |
| ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | Oct 14, 2024 | Explanation GenerationImage Forgery Detection | —Unverified | 0 |
| Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Jun 30, 2025 | Visual ReasoningZero Shot Segmentation | —Unverified | 0 |
| From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study | Jan 7, 2025 | Prompt EngineeringVisual Reasoning | —Unverified | 0 |
| From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | Mar 17, 2025 | DenoisingQuestion Answering | —Unverified | 0 |
| From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering | Jun 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| From Visual to Acoustic Question Answering | Feb 28, 2019 | Acoustic Question AnsweringPosition | —Unverified | 0 |
| From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation | Nov 21, 2023 | Explanation GenerationVisual Question Answering (VQA) | —Unverified | 0 |
| FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving | May 23, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 |
| GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | May 29, 2025 | Multimodal ReasoningMVBench | —Unverified | 0 |
| A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning | Jun 16, 2024 | Transfer LearningVisual Reasoning | —Unverified | 0 |
| Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts | Jan 1, 2024 | Image GenerationInstruction Following | —Unverified | 0 |
| GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs | Mar 30, 2025 | Visual Reasoning | —Unverified | 0 |
| Grammar-Based Grounded Lexicon Learning | Feb 17, 2022 | Network EmbeddingSentence | —Unverified | 0 |
| Graph Representation for Order-Aware Visual Transformation | Jan 1, 2023 | Visual Reasoning | —Unverified | 0 |
| GRIT: Teaching MLLMs to Think with Images | May 21, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning | Mar 30, 2021 | counterfactualObject | —Unverified | 0 |
| Grounding Physical Object and Event Concepts Through Dynamic Visual Reasoning | Jan 1, 2021 | counterfactualObject | —Unverified | 0 |
| Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning | May 26, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| GSON: A Group-based Social Navigation Framework with Large Multimodal Model | Sep 26, 2024 | Autonomous VehiclesMotion Planning | —Unverified | 0 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| Guiding Visual Question Answering with Attention Priors | May 25, 2022 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Jun 8, 2025 | AttributeHallucination | —Unverified | 0 |
| HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | Jun 26, 2025 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model | Jun 1, 2024 | Action RecognitionActivity Recognition | —Unverified | 0 |
| Towards Explainable Neural-Symbolic Visual Reasoning | Sep 19, 2019 | Explainable artificial intelligenceExplainable Artificial Intelligence (XAI) | —Unverified | 0 |
| I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction | Jul 19, 2024 | 3D ReconstructionSpatial Reasoning | —Unverified | 0 |
| Image as a Foreign Language: BEiT Pretraining for Vision and Vision-Language Tasks | Jan 1, 2023 | Cross-Modal RetrievalImage Captioning | —Unverified | 0 |
| Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models | May 22, 2024 | Multimodal ReasoningVisual Question Answering | —Unverified | 0 |
| Impact of ML Optimization Tactics on Greener Pre-Trained ML Models | Sep 19, 2024 | GPUimage-classification | —Unverified | 0 |