| Language-Vision Planner and Executor for Text-to-Visual Reasoning | Jun 9, 2025 | In-Context LearningMME | —Unverified | 0 | 0 |
| FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving | May 23, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 | 0 |
| From Wrong To Right: A Recursive Approach Towards Vision-Language Explanation | Nov 21, 2023 | Explanation GenerationVisual Question Answering (VQA) | —Unverified | 0 | 0 |
| LaViPlan : Language-Guided Visual Path Planning with RLVR | Jul 17, 2025 | Autonomous DrivingVision-Language-Action | —Unverified | 0 | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 | 0 |
| VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | Dec 3, 2024 | BenchmarkingVisual Reasoning | —Unverified | 0 | 0 |
| From Visual to Acoustic Question Answering | Feb 28, 2019 | Acoustic Question AnsweringPosition | —Unverified | 0 | 0 |
| VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | May 26, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models | Feb 13, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| What Makes a Maze Look Like a Maze? | Sep 12, 2024 | Visual Reasoning | —Unverified | 0 | 0 |
| Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 | 0 |
| From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering | Jun 25, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | Mar 17, 2025 | DenoisingQuestion Answering | —Unverified | 0 | 0 |
| Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data | Mar 3, 2020 | Robot ManipulationVisual Reasoning | —Unverified | 0 | 0 |
| Learning to Act Properly: Predicting and Explaining Affordances from Images | Dec 20, 2017 | Visual Reasoning | —Unverified | 0 | 0 |
| Learning to Agree on Vision Attention for Visual Commonsense Reasoning | Feb 4, 2023 | Visual Commonsense ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| Learning to Collocate Neural Modules for Image Captioning | Apr 18, 2019 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Are Elephants Bigger than Butterflies? Reasoning about Sizes of Objects | Feb 2, 2016 | Visual Reasoning | —Unverified | 0 | 0 |
| Learning to Compose and Reason with Language Tree Structures for Visual Grounding | Jun 5, 2019 | Visual GroundingVisual Reasoning | —Unverified | 0 | 0 |
| From Code to Compliance: Assessing ChatGPT's Utility in Designing an Accessible Webpage -- A Case Study | Jan 7, 2025 | Prompt EngineeringVisual Reasoning | —Unverified | 0 | 0 |
| VISREAS: Complex Visual Reasoning with Unanswerable Questions | Feb 23, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Jun 30, 2025 | Visual ReasoningZero Shot Segmentation | —Unverified | 0 | 0 |
| Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios | Nov 20, 2024 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 | 0 |
| Are Disentangled Representations Helpful for Abstract Visual Reasoning? | May 29, 2019 | DisentanglementVisual Reasoning | —Unverified | 0 | 0 |
| Learning to Stop Overthinking at Test Time | Feb 16, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization | Oct 14, 2024 | Explanation GenerationImage Forgery Detection | —Unverified | 0 | 0 |
| Abstract Visual Reasoning Enabled by Language | Mar 7, 2023 | ARCVisual Reasoning | —Unverified | 0 | 0 |
| Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering | May 2, 2022 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Visual Agentic AI for Spatial Reasoning with a Dynamic API | Feb 10, 2025 | Program SynthesisSpatial Reasoning | —Unverified | 0 | 0 |
| Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks | Mar 6, 2023 | Autonomous DrivingMedical Diagnosis | —Unverified | 0 | 0 |
| Lexical Conceptual Structure of Literal and Metaphorical Spatial Language: A Case Study of ``Push'' | Jun 1, 2018 | Machine TranslationTranslation | —Unverified | 0 | 0 |
| lilGym: Natural Language Visual Reasoning with Reinforcement Learning | Nov 3, 2022 | reinforcement-learningReinforcement Learning | —Unverified | 0 | 0 |
| Filling in the details: Perceiving from low fidelity images | Apr 14, 2016 | FoveationVisual Reasoning | —Unverified | 0 | 0 |
| Few-shot Visual Reasoning with Meta-analogical Contrastive Learning | Jul 23, 2020 | Contrastive LearningLogical Reasoning | —Unverified | 0 | 0 |
| Few-shot Subgoal Planning with Language Models | May 28, 2022 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| LLMs Are Not Yet Ready for Deepfake Image Detection | Jun 12, 2025 | DeepFake DetectionFace Swapping | —Unverified | 0 | 0 |
| Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Apr 30, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 | 0 |
| LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction | Jan 3, 2025 | Anomaly DetectionVisual Reasoning | —Unverified | 0 | 0 |
| Few-Shot Abstract Visual Reasoning With Spectral Features | Oct 4, 2019 | Few-Shot LearningVisual Reasoning | —Unverified | 0 | 0 |
| LOIS: Looking Out of Instance Semantics for Visual Question Answering | Jul 26, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 | 0 |
| Look, Remember and Reason: Grounded reasoning in videos with language models | Jun 30, 2023 | Objectobject-detection | —Unverified | 0 | 0 |
| An in-depth experimental study of sensor usage and visual reasoning of robots navigating in real environments | Nov 29, 2021 | BenchmarkingVisual Navigation | —Unverified | 0 | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 | 0 |
| Factorization of View-Object Manifolds for Joint Object Recognition and Pose Estimation | Mar 23, 2015 | ObjectObject Recognition | —Unverified | 0 | 0 |
| Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems | Jun 11, 2024 | In-Context LearningTraveling Salesman Problem | —Unverified | 0 | 0 |
| Explicit Knowledge Incorporation for Visual Reasoning | Jun 19, 2021 | Visual Reasoning | —Unverified | 0 | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 | 0 |