| EgoPrune: Efficient Token Pruning for Egomotion Video Reasoning in Embodied Agent | Jul 21, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Jul 17, 2025 | Multimodal ReasoningPose Estimation | —Unverified | 0 |
| The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs | Jul 10, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Perception-Aware Policy Optimization for Multimodal Reasoning | Jul 8, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Skywork-R1V3 Technical Report | Jul 8, 2025 | cross-modal alignmentMathematical Reasoning | CodeCode Available | 7 |
| Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling | Jul 8, 2025 | ArticlesMultimodal Reasoning | —Unverified | 0 |
| DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge | Jul 6, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning | Jul 1, 2025 | document understandingMultimodal Reasoning | CodeCode Available | 7 |
| Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers | Jun 30, 2025 | Multimodal Reasoning | CodeCode Available | 5 |
| APO: Enhancing Reasoning Ability of MLLMs via Asymmetric Policy Optimization | Jun 26, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 0 |
| HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context | Jun 26, 2025 | Large Language ModelMultimodal Reasoning | CodeCode Available | 2 |
| MultiFinRAG: An Optimized Multimodal Retrieval-Augmented Generation (RAG) Framework for Financial Question Answering | Jun 25, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| Adapting Vision-Language Models for Evaluating World Models | Jun 22, 2025 | Action RecognitionMultimodal Reasoning | —Unverified | 0 |
| Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | Jun 20, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| GRPO-CARE: Consistency-Aware Reinforcement Learning for Multimodal Reasoning | Jun 19, 2025 | Multimodal Reasoningreinforcement-learning | —Unverified | 0 |
| GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View | Jun 19, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MEGC2025: Micro-Expression Grand Challenge on Spot Then Recognize and Visual Question Answering | Jun 18, 2025 | Multimodal ReasoningQuestion Answering | —Unverified | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 |
| RadFabric: Agentic AI System with Reasoning Capability for Radiology | Jun 17, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning | Jun 16, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 1 |
| FinLMM-R1: Enhancing Financial Reasoning in LMM through Scalable Data and Reward Design | Jun 16, 2025 | Answer GenerationArithmetic Reasoning | —Unverified | 0 |
| VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training | Jun 16, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| MM-R5: MultiModal Reasoning-Enhanced ReRanker via Reinforcement Learning for Document Retrieval | Jun 14, 2025 | Instruction FollowingMultimodal Reasoning | CodeCode Available | 0 |
| FedNano: Toward Lightweight Federated Tuning for Pretrained Multimodal Large Language Models | Jun 12, 2025 | Cross-Modal RetrievalFederated Learning | —Unverified | 0 |
| Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Jun 12, 2025 | DiversityMinecraft | —Unverified | 0 |
| Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning | Jun 12, 2025 | AttributeMultimodal Reasoning | —Unverified | 0 |
| MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning | Jun 12, 2025 | Image GenerationMultimodal Reasoning | —Unverified | 0 |
| Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Jun 11, 2025 | Multimodal ReasoningSpatial Reasoning | CodeCode Available | 2 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency | Jun 10, 2025 | Multimodal Reasoning | —Unverified | 0 |
| KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Jun 9, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning | Jun 9, 2025 | MathMathematical Reasoning | CodeCode Available | 1 |
| Play to Generalize: Learning to Reason Through Game Play | Jun 9, 2025 | Domain GeneralizationMath | CodeCode Available | 2 |
| Decoupling the Image Perception and Multimodal Reasoning for Reasoning Segmentation with Digital Twin Representations | Jun 9, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| Learning Compact Vision Tokens for Efficient Large Multimodal Models | Jun 8, 2025 | Multimodal ReasoningToken Reduction | CodeCode Available | 1 |
| MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning | Jun 5, 2025 | Dataset GenerationMathematical Problem-Solving | CodeCode Available | 1 |
| Look Before You Leap: A GUI-Critic-R1 Model for Pre-Operative Error Diagnosis in GUI Automation | Jun 5, 2025 | Decision MakingMultimodal Reasoning | —Unverified | 0 |
| MuSciClaims: Multimodal Scientific Claim Verification | Jun 5, 2025 | ArticlesClaim Verification | —Unverified | 0 |
| MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos | Jun 4, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Jun 4, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| MiMo-VL Technical Report | Jun 4, 2025 | Multimodal Reasoning | CodeCode Available | 4 |
| RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Jun 4, 2025 | Multimodal ReasoningReasoning Segmentation | —Unverified | 0 |
| SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning | Jun 2, 2025 | Multimodal Reasoningreinforcement-learning | —Unverified | 0 |
| GThinker: Towards General Multimodal Reasoning via Cue-Guided Rethinking | Jun 1, 2025 | 4kMath | CodeCode Available | 0 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | May 30, 2025 | BenchmarkingBlocking | CodeCode Available | 2 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | May 30, 2025 | Autonomous DrivingMath | CodeCode Available | 1 |
| Preemptive Hallucination Reduction: An Input-Level Approach for Multimodal Language Model | May 29, 2025 | HallucinationLanguage Modeling | —Unverified | 0 |
| Infi-Med: Low-Resource Medical MLLMs with Robust Reasoning Evaluation | May 29, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |