| Test-time Distribution Learning Adapter for Cross-modal Visual Reasoning | Mar 10, 2024 | Human-Object Interaction DetectionPrediction | —Unverified | 0 | 0 |
| Bootstrapping Top-down Information for Self-modulating Slot Attention | Nov 4, 2024 | ObjectObject Discovery | —Unverified | 0 | 0 |
| TextCaps: a Dataset for Image Captioning with Reading Comprehension | Mar 24, 2020 | Image CaptioningOptical Character Recognition | —Unverified | 0 | 0 |
| A Corpus of Natural Language for Visual Reasoning | Jul 1, 2017 | Question AnsweringVisual Question Answering (VQA) | —Unverified | 0 | 0 |
| V-PROM: A Benchmark for Visual Reasoning Using Visual Progressive Matrices | Jul 29, 2019 | Visual Reasoning | —Unverified | 0 | 0 |
| The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework | May 25, 2025 | AttributeLanguage Modeling | —Unverified | 0 | 0 |
| VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges | Dec 26, 2022 | Representation LearningVisual Question Answering (VQA) | —Unverified | 0 | 0 |
| The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task | Nov 15, 2023 | Visual Reasoning | —Unverified | 0 | 0 |
| The role of object-centric representations, guided attention, and external memory on generalizing visual relations | Apr 14, 2023 | RelationVisual Reasoning | —Unverified | 0 | 0 |
| Wu's Method can Boost Symbolic AI to Rival Silver Medalists and AlphaGeometry to Outperform Gold Medalists at IMO Geometry | Apr 9, 2024 | Automated Theorem ProvingCPU | —Unverified | 0 | 0 |
| Think-Program-reCtify: 3D Situated Reasoning with Large Language Models | Apr 23, 2024 | Visual Reasoning | —Unverified | 0 | 0 |
| Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Feb 4, 2025 | Computational EfficiencyMultimodal Reasoning | —Unverified | 0 | 0 |
| Boosting Cross-task Transferability of Adversarial Patches with Visual Relations | Apr 11, 2023 | Image CaptioningObject Recognition | —Unverified | 0 | 0 |
| BlenderAlchemy: Editing 3D Graphics with Vision-Language Models | Apr 26, 2024 | Game DesignImage Generation | —Unverified | 0 | 0 |
| A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task | Apr 24, 2025 | Question AnsweringRetrieval | —Unverified | 0 | 0 |
| A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise | Dec 19, 2023 | MMEVisual Reasoning | —Unverified | 0 | 0 |
| Towards A Unified Neural Architecture for Visual Recognition and Reasoning | Nov 10, 2023 | Objectobject-detection | —Unverified | 0 | 0 |
| Big Generalizations with Small Data: Exploring the Role of Training Samples in Learning Adjectives of Size | Nov 1, 2019 | Small Data Image ClassificationVisual Reasoning | —Unverified | 0 | 0 |
| Towards Generative Abstract Reasoning: Completing Raven's Progressive Matrix via Rule Abstraction and Selection | Jan 18, 2024 | Answer GenerationAttribute | —Unverified | 0 | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 | 0 |
| Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers | Jan 3, 2024 | Question AnsweringVisual Grounding | —Unverified | 0 | 0 |
| Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason? | Dec 20, 2022 | Question AnsweringRepresentation Learning | —Unverified | 0 | 0 |
| Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection | Mar 5, 2025 | Anomaly DetectionObject | —Unverified | 0 | 0 |
| Transfer Learning in Visual and Relational Reasoning | Nov 27, 2019 | Question AnsweringRelational Reasoning | —Unverified | 0 | 0 |
| Beyond Visual Understanding: Introducing PARROT-360V for Vision Language Model Benchmarking | Nov 20, 2024 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning | Jun 10, 2025 | Task PlanningVisual Reasoning | —Unverified | 0 | 0 |
| Transformers in Vision: A Survey | Jan 4, 2021 | Action RecognitionActivity Recognition | —Unverified | 0 | 0 |
| Transformers Utilization in Chart Understanding: A Review of Recent Advances & Future Trends | Oct 5, 2024 | BenchmarkingChart Understanding | —Unverified | 0 | 0 |
| X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs | Jul 18, 2024 | Contrastive LearningRepresentation Learning | —Unverified | 0 | 0 |
| Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning | Aug 18, 2023 | Visual Reasoning | —Unverified | 0 | 0 |
| TRRNet: Tiered Relation Reasoning for Compositional Visual Question Answering | Aug 1, 2020 | ObjectQuestion Answering | —Unverified | 0 | 0 |
| TVBench: Redesigning Video-Language Evaluation | Oct 10, 2024 | Multiple-choiceOpen-Ended Question Answering | —Unverified | 0 | 0 |
| Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning | Mar 10, 2023 | Few-Shot Image Classificationimage-classification | —Unverified | 0 | 0 |
| Beyond Visual Appearances: Privacy-sensitive Objects Identification via Hybrid Graph Reasoning | Jun 18, 2024 | Data AugmentationGraph Generation | —Unverified | 0 | 0 |
| Understanding the computational demands underlying visual reasoning | Aug 8, 2021 | Visual Reasoning | —Unverified | 0 | 0 |
| Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models | May 27, 2025 | Question AnsweringVisual Reasoning | —Unverified | 0 | 0 |
| Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning | Jul 15, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| Weakly Supervised Semantic Parsing with Abstract Examples | Jul 1, 2018 | Semantic ParsingVisual Reasoning | —Unverified | 0 | 0 |
| Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans | May 16, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| Unifying Vision-Language Representation Space with Single-tower Transformer | Nov 21, 2022 | Contrastive LearningObject Localization | —Unverified | 0 | 0 |
| Benchmark Visual Question Answer Models by using Focus Map | Jan 13, 2018 | Visual Reasoning | —Unverified | 0 | 0 |
| Automated Evaluation of Large Vision-Language Models on Self-driving Corner Cases | Apr 16, 2024 | Autonomous DrivingVisual Reasoning | —Unverified | 0 | 0 |
| Abstract Visual Reasoning with Tangram Shapes | Nov 29, 2022 | Visual Reasoning | —Unverified | 0 | 0 |
| Grounded Object Centric Learning | Jul 18, 2023 | ObjectObject Discovery | —Unverified | 0 | 0 |
| Automated 3D Physical Simulation of Open-world Scene with Gaussian Splatting | Nov 19, 2024 | 3D GenerationGPU | —Unverified | 0 | 0 |
| VALSE: A Task-Independent Benchmark for Vision and Language Models centered on Linguistic Phenomena | Aug 17, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Abstracting Concept-Changing Rules for Solving Raven's Progressive Matrix Problems | Jul 15, 2023 | Answer GenerationAnswer Selection | —Unverified | 0 | 0 |
| A Unified View of Abstract Visual Reasoning Problems | Jun 16, 2024 | Transfer LearningVisual Reasoning | —Unverified | 0 | 0 |
| Webly Supervised Knowledge Embedding Model for Visual Reasoning | Jun 1, 2020 | modelRepresentation Learning | —Unverified | 0 | 0 |
| Attention on Abstract Visual Reasoning | Nov 14, 2019 | Program inductionRelation | —Unverified | 0 | 0 |