| Improving Generalization in Visual Reasoning via Self-Ensemble | Oct 28, 2024 | Visual Question Answering (VQA)Visual Reasoning | —Unverified | 0 |
| Improving Scene Graph Classification by Exploiting Knowledge from Texts | Feb 9, 2021 | ClassificationGeneral Classification | —Unverified | 0 |
| Incorporating Structured Representations into Pretrained Vision & Language Models Using Scene Graphs | May 10, 2023 | Scene UnderstandingVisual Reasoning | —Unverified | 0 |
| INFERNO: Inferring Object-Centric 3D Scene Representations without Supervision | Sep 29, 2021 | ObjectVideo Object Tracking | —Unverified | 0 |
| Integrating LMM Planners and 3D Skill Policies for Generalizable Manipulation | Jan 30, 2025 | MemorizationScene Understanding | —Unverified | 0 |
| Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision | Aug 1, 2020 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Introduction to Soar | May 8, 2022 | ChunkingDecision Making | —Unverified | 0 |
| Iterative Search for Weakly Supervised Semantic Parsing | Jun 1, 2019 | Semantic ParsingVisual Reasoning | —Unverified | 0 |
| Iterative Visual Reasoning Beyond Convolutions | Mar 29, 2018 | Visual Reasoning | —Unverified | 0 |
| It's Not About the Journey; It's About the Destination: Following Soft Paths Under Question-Guidance for Visual Reasoning | Jun 1, 2019 | Transfer LearningVisual Reasoning | —Unverified | 0 |
| Jointly Visual- and Semantic-Aware Graph Memory Networks for Temporal Sentence Localization in Videos | Mar 2, 2023 | Representation LearningSentence | —Unverified | 0 |
| `Just because you are right, doesn't mean I am wrong': Overcoming a bottleneck in development and evaluation of Open-Ended VQA tasks | Apr 1, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Just Say the Name: Online Continual Learning with Category Names Only via Data Generation | Mar 16, 2024 | Continual LearningDiversity | —Unverified | 0 |
| KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Jun 9, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Language-Conditioned Robotic Manipulation with Fast and Slow Thinking | Jan 8, 2024 | Decision MakingIntent Recognition | —Unverified | 0 |
| Language-Guided Salient Object Ranking | Jan 1, 2025 | ObjectSaliency Ranking | —Unverified | 0 |
| Language-Vision Planner and Executor for Text-to-Visual Reasoning | Jun 9, 2025 | In-Context LearningMME | —Unverified | 0 |
| LaViPlan : Language-Guided Visual Path Planning with RLVR | Jul 17, 2025 | Autonomous DrivingVision-Language-Action | —Unverified | 0 |
| Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data | Mar 3, 2020 | Robot ManipulationVisual Reasoning | —Unverified | 0 |
| Learning to Act Properly: Predicting and Explaining Affordances from Images | Dec 20, 2017 | Visual Reasoning | —Unverified | 0 |
| Learning to Agree on Vision Attention for Visual Commonsense Reasoning | Feb 4, 2023 | Visual Commonsense ReasoningVisual Reasoning | —Unverified | 0 |
| Learning to Collocate Neural Modules for Image Captioning | Apr 18, 2019 | DecoderImage Captioning | —Unverified | 0 |
| Learning to Compose and Reason with Language Tree Structures for Visual Grounding | Jun 5, 2019 | Visual GroundingVisual Reasoning | —Unverified | 0 |
| Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios | Nov 20, 2024 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 |
| Learning to Stop Overthinking at Test Time | Feb 16, 2025 | Visual Reasoning | —Unverified | 0 |
| Lexical Conceptual Structure of Literal and Metaphorical Spatial Language: A Case Study of ``Push'' | Jun 1, 2018 | Machine TranslationTranslation | —Unverified | 0 |
| lilGym: Natural Language Visual Reasoning with Reinforcement Learning | Nov 3, 2022 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| LLMs Are Not Yet Ready for Deepfake Image Detection | Jun 12, 2025 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Apr 30, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction | Jan 3, 2025 | Anomaly DetectionVisual Reasoning | —Unverified | 0 |
| LOIS: Looking Out of Instance Semantics for Visual Question Answering | Jul 26, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| Look, Remember and Reason: Grounded reasoning in videos with language models | Jun 30, 2023 | Objectobject-detection | —Unverified | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Making History Matter: History-Advantage Sequence Training for Visual Dialog | Feb 25, 2019 | Answer GenerationDecoder | —Unverified | 0 |
| MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning | Oct 9, 2022 | Image-text Retrievalmultimodal interaction | —Unverified | 0 |
| MAPS: Advancing Multi-Modal Reasoning in Expert-Level Physical Science | Jan 18, 2025 | Visual Reasoning | —Unverified | 0 |
| MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems? | Jun 6, 2025 | Automated Theorem ProvingVisual Reasoning | —Unverified | 0 |
| Measuring CLEVRness: Black-box Testing of Visual Reasoning Models | Sep 29, 2021 | BenchmarkingDiagnostic | —Unverified | 0 |
| Measuring CLEVRness: Blackbox testing of Visual Reasoning Models | Feb 24, 2022 | BenchmarkingDiagnostic | —Unverified | 0 |
| MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models | Feb 15, 2025 | Natural Language UnderstandingVisual Reasoning | —Unverified | 0 |
| MiCo: Multi-image Contrast for Reinforcement Visual Reasoning | Jun 27, 2025 | Logical ReasoningRepresentation Learning | —Unverified | 0 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| M-LLM Based Video Frame Selection for Efficient Video Understanding | Feb 27, 2025 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning | May 28, 2024 | Decision MakingVideo Understanding | —Unverified | 0 |
| MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | Sep 9, 2024 | DiversityVisual Reasoning | —Unverified | 0 |
| MMRo: Are Multimodal LLMs Eligible as the Brain for In-Home Robotics? | Jun 28, 2024 | Task PlanningVisual Reasoning | —Unverified | 0 |
| MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems | Feb 27, 2025 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| Modeling Gestalt Visual Reasoning on the Raven's Progressive Matrices Intelligence Test Using Generative Image Inpainting Techniques | Nov 18, 2019 | Image InpaintingVisual Reasoning | —Unverified | 0 |