| Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions | Jun 10, 2025 | Visual Reasoning | —Unverified | 0 |
| Spatial Knowledge Distillation to aid Visual Reasoning | Dec 10, 2018 | DiagnosticKnowledge Distillation | —Unverified | 0 |
| SwitchCIT: Switching for Continual Instruction Tuning | Jul 16, 2024 | Text GenerationVisual Reasoning | —Unverified | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | Jun 2, 2025 | 8kMath | —Unverified | 0 |
| Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture | Jan 21, 2025 | AttributeDiversity | —Unverified | 0 |
| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 |
| Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning | Mar 10, 2024 | Human-Object Interaction DetectionPrediction | —Unverified | 0 |
| TextCaps: a Dataset for Image Captioning with Reading Comprehension | Mar 24, 2020 | Image CaptioningOptical Character Recognition | —Unverified | 0 |
| The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework | May 25, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task | Nov 15, 2023 | Visual Reasoning | —Unverified | 0 |
| The role of object-centric representations, guided attention, and external memory on generalizing visual relations | Apr 14, 2023 | RelationVisual Reasoning | —Unverified | 0 |
| Think-Program-reCtify: 3D Situated Reasoning with Large Language Models | Apr 23, 2024 | Visual Reasoning | —Unverified | 0 |
| Towards A Unified Neural Architecture for Visual Recognition and Reasoning | Nov 10, 2023 | Objectobject-detection | —Unverified | 0 |
| Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection | Jan 18, 2024 | Answer GenerationAttribute | —Unverified | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers | Jan 3, 2024 | Question AnsweringVisual Grounding | —Unverified | 0 |
| Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? | Dec 20, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection | Mar 5, 2025 | Anomaly DetectionObject | —Unverified | 0 |
| Transfer Learning in Visual and Relational Reasoning | Nov 27, 2019 | Question AnsweringRelational Reasoning | —Unverified | 0 |
| Transformers in Vision: A Survey | Jan 4, 2021 | Action RecognitionActivity Recognition | —Unverified | 0 |
| Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends | Oct 5, 2024 | BenchmarkingChart Understanding | —Unverified | 0 |
| Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning | Aug 18, 2023 | Visual Reasoning | —Unverified | 0 |
| TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering | Aug 1, 2020 | ObjectQuestion Answering | —Unverified | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 |
| Understanding the computational demands underlying visual reasoning | Aug 8, 2021 | Visual Reasoning | —Unverified | 0 |
| Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models | May 27, 2025 | Question AnsweringVisual Reasoning | —Unverified | 0 |
| Unifying Vision-Language Representation Space with Single-tower Transformer | Nov 21, 2022 | Contrastive LearningObject Localization | —Unverified | 0 |
| Grounded Object Centric Learning | Jul 18, 2023 | ObjectObject Discovery | —Unverified | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Mar 14, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| Video Captioning Using Weak Annotation | Sep 2, 2020 | SentenceVideo Captioning | —Unverified | 0 |
| ViLEM: Visual-Language Error Modeling for Image-Text Retrieval | Jan 1, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| VisAidMath: Benchmarking Visual-Aided Mathematical Reasoning | Oct 30, 2024 | BenchmarkingHallucination | —Unverified | 0 |
| VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning | Dec 3, 2024 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | May 26, 2025 | Visual Reasoning | —Unverified | 0 |
| Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| VISREAS: Complex Visual Reasoning with Unanswerable Questions | Feb 23, 2024 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Abstract Visual Reasoning Enabled by Language | Mar 7, 2023 | ARCVisual Reasoning | —Unverified | 0 |
| Visual Agentic AI for Spatial Reasoning with a Dynamic API | Feb 10, 2025 | Program SynthesisSpatial Reasoning | —Unverified | 0 |
| Visual Analytics of Neuron Vulnerability to Adversarial Attacks on Convolutional Neural Networks | Mar 6, 2023 | Autonomous DrivingMedical Diagnosis | —Unverified | 0 |
| Visual Commonsense based Heterogeneous Graph Contrastive Learning | Nov 11, 2023 | Contrastive LearningQuestion Answering | —Unverified | 0 |
| Visual Entailment: A Novel Task for Fine-Grained Image Understanding | Jan 20, 2019 | Natural Language InferenceQuestion Answering | —Unverified | 0 |
| Visual In-Context Learning for Large Vision-Language Models | Feb 18, 2024 | In-Context LearningPosition | —Unverified | 0 |
| Visual Language Models show widespread visual deficits on neuropsychological tests | Apr 15, 2025 | Object RecognitionVisual Reasoning | —Unverified | 0 |
| A Continual Learning Paradigm for Non-differentiable Visual Programming Frameworks on Visual Reasoning Tasks | Sep 18, 2023 | Continual LearningVisual Reasoning | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Visual Question Answering in the Medical Domain | Sep 20, 2023 | Contrastive LearningMedical Visual Question Answering | —Unverified | 0 |