| Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) | Apr 4, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models | May 20, 2025 | Autonomous DrivingMultimodal Reasoning | —Unverified | 0 |
| More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models | May 23, 2025 | DiagnosticHallucination | —Unverified | 0 |
| Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics | Apr 14, 2025 | Knowledge GraphsMultimodal Reasoning | —Unverified | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 |
| MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification | Mar 16, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Jun 25, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind | Apr 25, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models | Feb 22, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval | May 26, 2025 | Contrastive LearningImage Retrieval | —Unverified | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 |
| Multimodal Transformer with Multi-View Visual Representation for Image Captioning | May 20, 2019 | DecoderImage Captioning | —Unverified | 0 |
| MuSciClaims: Multimodal Scientific Claim Verification | Jun 5, 2025 | ArticlesClaim Verification | —Unverified | 0 |
| Assessing GPT4-V on Structured Reasoning Tasks | Dec 13, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 |
| NL-Eye: Abductive NLI for Images | Oct 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| NVLM: Open Frontier-Class Multimodal LLMs | Sep 17, 2024 | MathMultimodal Reasoning | —Unverified | 0 |
| X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains | May 6, 2025 | Multimodal Reasoning | —Unverified | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 |
| On scalable oversight with weak LLMs judging strong LLMs | Jul 5, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning | May 25, 2025 | Computational EfficiencyMultimodal Reasoning | —Unverified | 0 |
| Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | May 29, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Optimizing Vision-Language Interactions Through Decoder-Only Models | Dec 14, 2024 | DecoderImage Captioning | —Unverified | 0 |
| Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Jun 12, 2025 | DiversityMinecraft | —Unverified | 0 |
| Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge | May 11, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Perception-Aware Policy Optimization for Multimodal Reasoning | Jul 8, 2025 | Multimodal Reasoning | —Unverified | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 |
| Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines | Apr 5, 2023 | Decision MakingMultimodal Reasoning | —Unverified | 0 |
| POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models | Jun 6, 2024 | Multimodal ReasoningPrompt Engineering | —Unverified | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| Position: Empowering Time Series Reasoning with Multimodal LLMs | Feb 3, 2025 | Decision MakingMultimodal Reasoning | —Unverified | 0 |
| Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | May 29, 2025 | HallucinationLanguage Modeling | —Unverified | 0 |
| Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues | May 15, 2021 | Multimodal ReasoningNatural Language Inference | —Unverified | 0 |
| Progressive Multimodal Reasoning via Active Retrieval | Dec 19, 2024 | DiversityMultimodal Reasoning | —Unverified | 0 |
| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 |
| PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | May 17, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 |
| An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation | Oct 4, 2024 | Language ModellingMultimodal Reasoning | —Unverified | 0 |
| Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs | May 7, 2025 | Electrocardiography (ECG)Language Modeling | —Unverified | 0 |
| VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | Mar 13, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Question Aware Vision Transformer for Multimodal Reasoning | Feb 8, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark | Feb 24, 2025 | AllMultimodal Reasoning | —Unverified | 0 |
| Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework | Apr 1, 2025 | Decision MakingIn-Context Learning | —Unverified | 0 |
| Agentic 3D Scene Generation with Spatially Contextualized VLMs | May 26, 2025 | Multimodal ReasoningScene Generation | —Unverified | 0 |
| RadFabric: Agentic AI System with Reasoning Capability for Radiology | Jun 17, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation | May 4, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Reducing the Vision and Language Bias for Temporal Sentence Grounding | Jul 27, 2022 | Information RetrievalMultimodal Reasoning | —Unverified | 0 |
| Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models | Apr 30, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography | Feb 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning | May 31, 2024 | Answer GenerationMultimodal Reasoning | —Unverified | 0 |