| Mind with Eyes: from Language Reasoning to Multimodal Reasoning | Mar 23, 2025 | Action GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 | 0 |
| Mitigating Object Hallucinations in Large Vision-Language Models via Attention Calibration | Feb 4, 2025 | AttributeHallucination | —Unverified | 0 | 0 |
| Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | Mar 17, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency | Jun 10, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | Feb 13, 2025 | BenchmarkingMath | —Unverified | 0 | 0 |
| MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs | May 27, 2025 | Logical ReasoningMME | —Unverified | 0 | 0 |
| MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Apr 4, 2025 | BenchmarkingImage Generation | —Unverified | 0 | 0 |
| Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | Feb 4, 2025 | Computational EfficiencyMultimodal Reasoning | —Unverified | 0 | 0 |
| MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation | May 23, 2025 | Audio GenerationBenchmarking | —Unverified | 0 | 0 |
| MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Jun 12, 2025 | Image GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models | Feb 21, 2024 | Geometry Problem SolvingMolecular Property Prediction | —Unverified | 0 | 0 |
| AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry | Jan 15, 2023 | Fraud DetectionMultimodal Reasoning | —Unverified | 0 | 0 |
| MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos | Jun 4, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| MMS-VPR: Multimodal Street-Level Visual Place Recognition Dataset and Benchmark | May 18, 2025 | Multimodal ReasoningVisual Place Recognition | —Unverified | 0 | 0 |
| Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) | Apr 4, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| MORALISE: A Structured Benchmark for Moral Alignment in Visual Language Models | May 20, 2025 | Autonomous DrivingMultimodal Reasoning | —Unverified | 0 | 0 |
| More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models | May 23, 2025 | DiagnosticHallucination | —Unverified | 0 | 0 |
| Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics | Apr 14, 2025 | Knowledge GraphsMultimodal Reasoning | —Unverified | 0 | 0 |
| Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level | Nov 15, 2024 | Benchmarkingcounterfactual | —Unverified | 0 | 0 |
| MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification | Mar 16, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Jun 25, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 | 0 |
| MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind | Apr 25, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 | 0 |
| A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | Dec 16, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models | Feb 22, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval | May 26, 2025 | Contrastive LearningImage Retrieval | —Unverified | 0 | 0 |
| Multimodal Reasoning with Multimodal Knowledge Graph | Jun 4, 2024 | cross-modal alignmentGraph Attention | —Unverified | 0 | 0 |
| Multimodal Transformer with Multi-View Visual Representation for Image Captioning | May 20, 2019 | DecoderImage Captioning | —Unverified | 0 | 0 |
| MuSciClaims: Multimodal Scientific Claim Verification | Jun 5, 2025 | ArticlesClaim Verification | —Unverified | 0 | 0 |
| Assessing GPT4-V on Structured Reasoning Tasks | Dec 13, 2023 | Code GenerationLanguage Modeling | —Unverified | 0 | 0 |
| NL-Eye: Abductive NLI for Images | Oct 3, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 | 0 |
| NVLM: Open Frontier-Class Multimodal LLMs | Sep 17, 2024 | MathMultimodal Reasoning | —Unverified | 0 | 0 |
| X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains | May 6, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 | 0 |
| On scalable oversight with weak LLMs judging strong LLMs | Jul 5, 2024 | Multimodal ReasoningQuestion Answering | —Unverified | 0 | 0 |
| ASPO: Adaptive Sentence-Level Preference Optimization for Fine-Grained Multimodal Reasoning | May 25, 2025 | Computational EfficiencyMultimodal Reasoning | —Unverified | 0 | 0 |
| Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | May 29, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| Optimizing Vision-Language Interactions Through Decoder-Only Models | Dec 14, 2024 | DecoderImage Captioning | —Unverified | 0 | 0 |
| Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Jun 12, 2025 | DiversityMinecraft | —Unverified | 0 | 0 |
| Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge | May 11, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 | 0 |
| Perception-Aware Policy Optimization for Multimodal Reasoning | Jul 8, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 | 0 |
| Personality-aware Human-centric Multimodal Reasoning: A New Task, Dataset and Baselines | Apr 5, 2023 | Decision MakingMultimodal Reasoning | —Unverified | 0 | 0 |
| POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models | Jun 6, 2024 | Multimodal ReasoningPrompt Engineering | —Unverified | 0 | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 | 0 |
| Position: Empowering Time Series Reasoning with Multimodal LLMs | Feb 3, 2025 | Decision MakingMultimodal Reasoning | —Unverified | 0 | 0 |
| Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | May 29, 2025 | HallucinationLanguage Modeling | —Unverified | 0 | 0 |
| Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues | May 15, 2021 | Multimodal ReasoningNatural Language Inference | —Unverified | 0 | 0 |
| Progressive Multimodal Reasoning via Active Retrieval | Dec 19, 2024 | DiversityMultimodal Reasoning | —Unverified | 0 | 0 |