| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Cantor: Inspiring Multimodal Chain-of-Thought of MLLM | Apr 24, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators | Jul 20, 2024 | Action RecognitionCoLA | —Unverified | 0 |
| Can We Automate Diagrammatic Reasoning? | Feb 13, 2019 | Visual Reasoning | —Unverified | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 |
| Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL | May 21, 2025 | 4kMultimodal Reasoning | —Unverified | 0 |
| Chain of Functions: A Programmatic Pipeline for Fine-Grained Chart Reasoning Data | Mar 20, 2025 | DiversityVisual Reasoning | —Unverified | 0 |
| ChartBench: A Benchmark for Complex Visual Reasoning in Charts | Dec 26, 2023 | Visual Reasoning | —Unverified | 0 |
| ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models | May 19, 2025 | Chart Question AnsweringChart Understanding | —Unverified | 0 |
| ChartNet: Visual Reasoning over Statistical Charts using MAC-Networks | Nov 21, 2019 | General ClassificationVisual Reasoning | —Unverified | 0 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| Chat2Layout: Interactive 3D Furniture Layout with a Multimodal LLM | Jul 31, 2024 | In-Context LearningLayout Design | —Unverified | 0 |
| Chitrarth: Bridging Vision and Language for a Billion People | Feb 21, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads | Apr 30, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| CityLoc: 6DoF Pose Distributional Localization for Text Descriptions in Large-Scale Scenes with Gaussian Representation | Jan 15, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| CLIP-Powered TASS: Target-Aware Single-Stream Network for Audio-Visual Question Answering | May 13, 2024 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | —Unverified | 0 |
| CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs | Jan 5, 2024 | Image ComprehensionImage to text | —Unverified | 0 |
| Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | May 26, 2024 | Code RepairLanguage Modeling | —Unverified | 0 |
| A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs | Jan 23, 2025 | DescriptiveDiagnostic | —Unverified | 0 |
| Comparing Visual Reasoning in Humans and AI | Apr 29, 2021 | SentenceVisual Reasoning | —Unverified | 0 |
| Comparison Visual Instruction Tuning | Jun 13, 2024 | Instruction FollowingNovelty Detection | —Unverified | 0 |
| Compositional Law Parsing with Latent Random Functions | Sep 15, 2022 | PositionVisual Reasoning | —Unverified | 0 |
| Continual learning on 3D point clouds with random compressed rehearsal | May 16, 2022 | Continual LearningVisual Reasoning | —Unverified | 0 |
| Cops-Ref: A new Dataset and Task on Compositional Referring Expression Comprehension | Mar 1, 2020 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Nov 16, 2021 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Co-VQA : Answering by Interactive Sub Question Sequence | Apr 2, 2022 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| Critical Features Tracking on Triangulated Irregular Networks by a Scale-Space Method | Sep 10, 2024 | Visual Reasoning | —Unverified | 0 |
| Curriculum Learning for Compositional Visual Reasoning | Mar 27, 2023 | Question AnsweringVisual Question Answering | —Unverified | 0 |
| DAReN: A Collaborative Approach Towards Reasoning And Disentangling | Sep 27, 2021 | DisentanglementInductive Bias | —Unverified | 0 |
| Multimodal Analysis Of Google Bard And GPT-Vision: Experiments In Visual Reasoning | Aug 17, 2023 | Common Sense ReasoningOptical Character Recognition | —Unverified | 0 |
| Deep Learning Methods for Abstract Visual Reasoning: A Survey on Raven's Progressive Matrices | Jan 28, 2022 | Visual Reasoning | —Unverified | 0 |
| Deep Neural Networks for Visual Reasoning | Sep 24, 2022 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Deep Reason: A Strong Baseline for Real-World Visual Reasoning | May 24, 2019 | Visual Reasoning | —Unverified | 0 |
| Deep Visual Reasoning: Learning to Predict Action Sequences for Task and Motion Planning from an Initial Scene Image | Jun 9, 2020 | Motion PlanningTask and Motion Planning | —Unverified | 0 |
| Disentangling Knowledge-based and Visual Reasoning by Question Decomposition in KB-VQA | Jun 27, 2024 | General KnowledgeQuestion Answering | —Unverified | 0 |
| Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning | May 24, 2025 | document understandingVisual Reasoning | —Unverified | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 |
| Does Structural Attention Improve Compositional Representations in Vision-Language Models? | Dec 3, 2022 | Visual Reasoning | —Unverified | 0 |
| Does Visual Pretraining Help End-to-End Reasoning? | Jul 17, 2023 | image-classificationImage Classification | —Unverified | 0 |
| Do Vision-Language Transformers Exhibit Visual Commonsense? An Empirical Study of VCR | May 27, 2024 | Question AnsweringTAG | —Unverified | 0 |
| Do we Really Need Visual Instructions? Towards Visual Instruction-Free Fine-tuning for Large Vision-Language Models | Feb 17, 2025 | Instruction Followingvisual instruction following | —Unverified | 0 |
| Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models | Apr 27, 2025 | Visual ReasoningWorld Knowledge | —Unverified | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 |
| Dual Local-Global Contextual Pathways for Recognition in Aerial Imagery | May 18, 2016 | Object RecognitionRoad Segmentation | —Unverified | 0 |
| DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning | Mar 25, 2025 | Visual Reasoning | —Unverified | 0 |
| Dynamic Graph Attention for Referring Expression Comprehension | Sep 18, 2019 | Graph AttentionReferring Expression | —Unverified | 0 |
| Dynamic Visual Reasoning by Learning Differentiable Physics Models from Video and Language | Oct 28, 2021 | counterfactualVisual Reasoning | —Unverified | 0 |
| EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues | Dec 19, 2024 | Change DetectionDisaster Response | —Unverified | 0 |
| EgoReID: Cross-view Self-Identification and Human Re-identification in Egocentric and Surveillance Videos | Dec 24, 2016 | Person Re-IdentificationVisual Reasoning | —Unverified | 0 |
| End-to-End Chart Summarization via Visual Chain-of-Thought in Vision-Language Models | Feb 24, 2025 | Visual Reasoning | —Unverified | 0 |