| Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding | Mar 29, 2022 | Multimodal ReasoningVisual Grounding | CodeCode Available | 1 |
| Fine-Grained Visual Entailment | Mar 29, 2022 | Multimodal ReasoningVisual Entailment | CodeCode Available | 1 |
| PACS: A Dataset for Physical Audiovisual CommonSense Reasoning | Mar 21, 2022 | Common Sense ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| WebQA: Multihop and Multimodal QA | Sep 1, 2021 | Image RetrievalMultimodal Reasoning | CodeCode Available | 1 |
| Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision | Aug 12, 2021 | 3D geometryDescriptive | CodeCode Available | 1 |
| MERLOT: Multimodal Neural Script Knowledge Models | Jun 4, 2021 | Multimodal ReasoningVisual Commonsense Reasoning | CodeCode Available | 1 |
| A Multimodal Framework for the Detection of Hateful Memes | Dec 23, 2020 | Ensemble LearningMultimodal Reasoning | CodeCode Available | 1 |
| e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations | Apr 7, 2020 | Multimodal ReasoningNatural Language Inference | CodeCode Available | 1 |
| EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent | Jul 21, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Jul 17, 2025 | Multimodal ReasoningPose Estimation | —Unverified | 0 |
| The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs | Jul 10, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Jul 8, 2025 | ArticlesMultimodal Reasoning | —Unverified | 0 |
| Perception-Aware Policy Optimization for Multimodal Reasoning | Jul 8, 2025 | Multimodal Reasoning | —Unverified | 0 |
| APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization | Jun 26, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 0 |
| MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Jun 25, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Adapting Vision-Language Models for Evaluating World Models | Jun 22, 2025 | Action RecognitionMultimodal Reasoning | —Unverified | 0 |
| GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | Jun 19, 2025 | Multimodal Reasoningreinforcement-learning | —Unverified | 0 |
| GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View | Jun 19, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering | Jun 18, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| RadFabric: Agentic AI System with Reasoning Capability for Radiology | Jun 17, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training | Jun 16, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval | Jun 14, 2025 | Instruction FollowingMultimodal Reasoning | CodeCode Available | 0 |
| MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Jun 12, 2025 | Image GenerationMultimodal Reasoning | —Unverified | 0 |
| FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models | Jun 12, 2025 | Cross-Modal RetrievalFederated Learning | —Unverified | 0 |
| Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning | Jun 12, 2025 | AttributeMultimodal Reasoning | —Unverified | 0 |
| Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Jun 12, 2025 | DiversityMinecraft | —Unverified | 0 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency | Jun 10, 2025 | Multimodal Reasoning | —Unverified | 0 |
| KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Jun 9, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Jun 9, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation | Jun 5, 2025 | Decision MakingMultimodal Reasoning | —Unverified | 0 |
| MuSciClaims: Multimodal Scientific Claim Verification | Jun 5, 2025 | ArticlesClaim Verification | —Unverified | 0 |
| Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Jun 4, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Jun 4, 2025 | Multimodal ReasoningReasoning Segmentation | —Unverified | 0 |
| MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos | Jun 4, 2025 | Multimodal Reasoning | —Unverified | 0 |
| SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning | Jun 2, 2025 | Multimodal Reasoningreinforcement-learning | —Unverified | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation | May 29, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought | May 29, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | May 29, 2025 | HallucinationLanguage Modeling | —Unverified | 0 |
| GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | May 29, 2025 | Multimodal ReasoningMVBench | —Unverified | 0 |
| MMBoundary: Advancing MLLM Knowledge Boundary Awareness through Reasoning Step Confidence Calibration | May 29, 2025 | HallucinationMultimodal Reasoning | CodeCode Available | 0 |
| Elicit and Enhance: Advancing Multimodal Reasoning in Medical Scenarios | May 29, 2025 | Multimodal Reasoning | —Unverified | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 |
| Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models | May 29, 2025 | Logical ReasoningMath | —Unverified | 0 |
| SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning | May 28, 2025 | Image SegmentationMultimodal Reasoning | —Unverified | 0 |