| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 | 0 |
| Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs | Jun 27, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| Same-different problems strain convolutional neural networks | Feb 9, 2018 | MemorizationVisual Reasoning | —Unverified | 0 | 0 |
| VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection | May 26, 2025 | Diversityreinforcement-learning | —Unverified | 0 | 0 |
| SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems | Mar 13, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models | May 26, 2025 | Uncertainty QuantificationVisual Reasoning | —Unverified | 0 | 0 |
| CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation | Jan 15, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads | Apr 30, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Seeing the Intangible: Survey of Image Classification into High-Level and Abstract Categories | Aug 21, 2023 | ClassificationClustering | —Unverified | 0 | 0 |
| Chitrarth: Bridging Vision and Language for a Billion People | Feb 21, 2025 | DiversityLanguage Modeling | —Unverified | 0 | 0 |
| Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM | Jul 31, 2024 | In-Context LearningLayout Design | —Unverified | 0 | 0 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 | 0 |
| ChartNet: Visual Reasoning over Statistical Charts using MAC-Networks | Nov 21, 2019 | General ClassificationVisual Reasoning | —Unverified | 0 | 0 |
| ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models | May 19, 2025 | Chart Question AnsweringChart Understanding | —Unverified | 0 | 0 |
| SelfEval: Leveraging the discriminative nature of generative models for evaluation | Nov 17, 2023 | AttributeVisual Reasoning | —Unverified | 0 | 0 |
| Self-Segregating and Coordinated-Segregating Transformer for Focused Deep Multi-Modular Network for Visual Question Answering | Jun 25, 2020 | DiversityQuestion Answering | —Unverified | 0 | 0 |
| ChartBench: A Benchmark for Complex Visual Reasoning in Charts | Dec 26, 2023 | Visual Reasoning | —Unverified | 0 | 0 |
| Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Feb 24, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 | 0 |
| SHOP-VRB: A Visual Reasoning Benchmark for Object Perception | Apr 6, 2020 | ObjectVisual Reasoning | —Unverified | 0 | 0 |
| Does Acceleration Cause Hidden Instability in Vision Language Models? Uncovering Instance-Level Divergence Through a Large-Scale Empirical Study | Mar 9, 2025 | QuantizationToken Reduction | —Unverified | 0 | 0 |
| Simple Token-Level Confidence Improves Caption Correctness | May 11, 2023 | HallucinationImage Captioning | —Unverified | 0 | 0 |
| Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data | Mar 20, 2025 | DiversityVisual Reasoning | —Unverified | 0 | 0 |
| 2nd Place Solution to the GQA Challenge 2019 | Jul 16, 2019 | Question AnsweringVisual Question Answering | —Unverified | 0 | 0 |
| Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL | May 21, 2025 | 4kMultimodal Reasoning | —Unverified | 0 | 0 |
| Slow Perception: Let's Perceive Geometric Figures Step-by-step | Dec 30, 2024 | MathVisual Reasoning | —Unverified | 0 | 0 |
| VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Apr 21, 2025 | AttributeVisual Reasoning | —Unverified | 0 | 0 |
| SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection | Mar 5, 2024 | Concept AlignmentExplanation Generation | —Unverified | 0 | 0 |
| Social-IQ: A Question Answering Benchmark for Artificial Social Intelligence | Jun 1, 2019 | Question AnsweringVisual Reasoning | —Unverified | 0 | 0 |
| Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions | Jun 10, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| ViUniT: Visual Unit Tests for More Robust Visual Programming | Dec 12, 2024 | Image GenerationImage-text matching | —Unverified | 0 | 0 |
| VL-BEiT: Generative Vision-Language Pretraining | Jun 2, 2022 | image-classificationImage Classification | —Unverified | 0 | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 | 0 |
| Can We Automate Diagrammatic Reasoning? | Feb 13, 2019 | Visual Reasoning | —Unverified | 0 | 0 |
| Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Jul 20, 2024 | Action RecognitionCoLA | —Unverified | 0 | 0 |
| Spatial Knowledge Distillation to aid Visual Reasoning | Dec 10, 2018 | DiagnosticKnowledge Distillation | —Unverified | 0 | 0 |
| Adaptive recurrent vision performs zero-shot computation scaling to unseen difficulty levels | Nov 12, 2023 | PathfinderVisual Reasoning | —Unverified | 0 | 0 |
| VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making | May 6, 2025 | Decision MakingGeneral Knowledge | —Unverified | 0 | 0 |
| Cantor: Inspiring Multimodal Chain-of-Thought of MLLM | Apr 24, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 | 0 |
| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 | 0 |
| CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography | Apr 14, 2025 | BenchmarkingVisual Reasoning | —Unverified | 0 | 0 |
| SwitchCIT: Switching for Continual Instruction Tuning | Jul 16, 2024 | Text GenerationVisual Reasoning | —Unverified | 0 | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 | 0 |
| SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | Jun 2, 2025 | 8kMath | —Unverified | 0 | 0 |
| Systematic Abductive Reasoning via Diverse Relation Representations in Vector-symbolic Architecture | Jan 21, 2025 | AttributeDiversity | —Unverified | 0 | 0 |
| Adaptive Discrete Communication Bottlenecks with Dynamic Vector Quantization | Feb 2, 2022 | Quantizationreinforcement-learning | —Unverified | 0 | 0 |
| Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks | Jul 31, 2023 | Image RetrievalObject | —Unverified | 0 | 0 |
| Take A Step Back: Rethinking the Two Stages in Visual Reasoning | Jul 29, 2024 | Logical ReasoningQuestion Answering | —Unverified | 0 | 0 |
| VLM@school -- Evaluation of AI image understanding on German middle school knowledge | Jun 13, 2025 | Visual Reasoning | —Unverified | 0 | 0 |
| World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Jun 26, 2025 | Imitation LearningLanguage Modeling | —Unverified | 0 | 0 |
| ACRE: Abstract Causal REasoning Beyond Covariation | Mar 26, 2021 | BlockingCausal Discovery | —Unverified | 0 | 0 |