| LaViPlan : Language-Guided Visual Path Planning with RLVR | Jul 17, 2025 | Autonomous DrivingVision-Language-Action | —Unverified | 0 |
| Beyond Task-Specific Reasoning: A Unified Conditional Generative Framework for Abstract Visual Reasoning | Jul 15, 2025 | Visual Reasoning | CodeCode Available | 0 |
| PyVision: Agentic Vision with Dynamic Tooling | Jul 10, 2025 | Visual Reasoning | —Unverified | 0 |
| MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning | Jul 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 |
| Orchestrator-Agent Trust: A Modular Agentic AI Visual Classification System with Trust-Aware Orchestration and RAG-Based Reasoning | Jul 9, 2025 | BenchmarkingImage Retrieval | CodeCode Available | 0 |
| High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning | Jul 8, 2025 | MMEReinforcement Learning (RL) | CodeCode Available | 2 |
| Skywork-R1V3 Technical Report | Jul 8, 2025 | cross-modal alignmentMathematical Reasoning | CodeCode Available | 7 |
| Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning | Jul 7, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 |
| Foundation Models for Zero-Shot Segmentation of Scientific Images without AI-Ready Data | Jun 30, 2025 | Visual ReasoningZero Shot Segmentation | —Unverified | 0 |
| MiCo: Multi-image Contrast for Reinforcement Visual Reasoning | Jun 27, 2025 | Logical ReasoningRepresentation Learning | —Unverified | 0 |
| Visual Structures Helps Visual Reasoning: Addressing the Binding Problem in VLMs | Jun 27, 2025 | Visual Reasoning | —Unverified | 0 |
| HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation | Jun 26, 2025 | counterfactualCounterfactual Reasoning | —Unverified | 0 |
| World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Jun 26, 2025 | Imitation LearningLanguage Modeling | —Unverified | 0 |
| Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens | Jun 20, 2025 | Image GenerationMultimodal Reasoning | CodeCode Available | 3 |
| VLM@school -- Evaluation of AI image understanding on German middle school knowledge | Jun 13, 2025 | Visual Reasoning | —Unverified | 0 |
| VGR: Visual Grounded Reasoning | Jun 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| LLMs Are Not Yet Ready for Deepfake Image Detection | Jun 12, 2025 | DeepFake DetectionFace Swapping | —Unverified | 0 |
| Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning | Jun 11, 2025 | Image CaptioningMath | CodeCode Available | 2 |
| ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering | Jun 11, 2025 | Chart Question AnsweringImage to text | —Unverified | 0 |
| Socratic-MCTS: Test-Time Visual Reasoning by Asking the Right Questions | Jun 10, 2025 | Visual Reasoning | —Unverified | 0 |
| VReST: Enhancing Reasoning in Large Vision-Language Models through Tree Search and Self-Reward Mechanism | Jun 10, 2025 | Mathematical ReasoningVisual Reasoning | CodeCode Available | 0 |
| VIKI-R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning | Jun 10, 2025 | Task PlanningVisual Reasoning | —Unverified | 0 |
| KokushiMD-10: Benchmark for Evaluating Large Language Models on Ten Japanese National Healthcare Licensing Examinations | Jun 9, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Language-Vision Planner and Executor for Text-to-Visual Reasoning | Jun 9, 2025 | In-Context LearningMME | —Unverified | 0 |
| Synthetic Visual Genome | Jun 9, 2025 | Referring ExpressionReferring Expression Comprehension | —Unverified | 0 |
| Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning | Jun 8, 2025 | AttributeHallucination | —Unverified | 0 |
| Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification | Jun 8, 2025 | Question AnsweringVisual Question Answering | CodeCode Available | 1 |
| MATP-BENCH: Can MLLM Be a Good Automated Theorem Prover for Multimodal Problems? | Jun 6, 2025 | Automated Theorem ProvingVisual Reasoning | —Unverified | 0 |
| MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning | Jun 5, 2025 | MathMathematical Reasoning | CodeCode Available | 2 |
| ControlThinker: Unveiling Latent Semantics for Controllable Image Generation through Visual Reasoning | Jun 4, 2025 | Image GenerationVisual Reasoning | CodeCode Available | 0 |
| Evaluating MLLMs with Multimodal Multi-image Reasoning Benchmark | Jun 4, 2025 | SentenceVisual Reasoning | —Unverified | 0 |
| SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | Jun 2, 2025 | 8kMath | —Unverified | 0 |
| DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis? | May 30, 2025 | DiagnosticMedical Image Analysis | CodeCode Available | 1 |
| Reinforcing Video Reasoning with Focused Thinking | May 30, 2025 | Data AugmentationVisual Reasoning | CodeCode Available | 1 |
| ProxyThinker: Test-Time Guidance through Small Visual Reasoners | May 30, 2025 | Visual Reasoning | CodeCode Available | 1 |
| Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents | May 30, 2025 | BenchmarkingBlocking | CodeCode Available | 2 |
| Agent-X: Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks | May 30, 2025 | Autonomous DrivingMath | CodeCode Available | 1 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 |
| GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | May 29, 2025 | Multimodal ReasoningMVBench | —Unverified | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Thinking with Generated Images | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 |
| Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 |
| Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models | May 27, 2025 | Question AnsweringVisual Reasoning | CodeCode Available | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | May 26, 2025 | Visual Reasoning | —Unverified | 0 |
| VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection | May 26, 2025 | Diversityreinforcement-learning | —Unverified | 0 |
| Visual Abstract Thinking Empowers Multimodal Reasoning | May 26, 2025 | Multimodal ReasoningRelational Reasoning | CodeCode Available | 1 |
| Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning | May 26, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |