| SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis | Jun 2, 2025 | 8kMath | —Unverified | 0 |
| MIRAGE: Assessing Hallucination in Multimodal Reasoning Chains of MLLM | May 30, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| GAM-Agent: Game-Theoretic and Uncertainty-Aware Collaboration for Complex Visual Reasoning | May 29, 2025 | Multimodal ReasoningMVBench | —Unverified | 0 |
| OmniAD: Detect and Understand Industrial Anomaly via Multimodal Reasoning | May 28, 2025 | Anomaly DetectionMultimodal Reasoning | —Unverified | 0 |
| Thinking with Generated Images | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 |
| Beyond Perception: Evaluating Abstract Visual Reasoning through Multi-Stage Task | May 28, 2025 | Visual Reasoning | CodeCode Available | 0 |
| Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models | May 27, 2025 | Question AnsweringVisual Reasoning | —Unverified | 0 |
| VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models | May 26, 2025 | Visual Reasoning | —Unverified | 0 |
| Ground-R1: Incentivizing Grounded Visual Reasoning via Reinforcement Learning | May 26, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning | May 26, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 |
| Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models | May 26, 2025 | Uncertainty QuantificationVisual Reasoning | —Unverified | 0 |
| VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection | May 26, 2025 | Diversityreinforcement-learning | —Unverified | 0 |
| ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding | May 25, 2025 | Chart UnderstandingLogical Reasoning | CodeCode Available | 0 |
| The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework | May 25, 2025 | AttributeLanguage Modeling | —Unverified | 0 |
| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Doc-CoB: Enhancing Multi-Modal Document Understanding with Visual Chain-of-Boxes Reasoning | May 24, 2025 | document understandingVisual Reasoning | —Unverified | 0 |
| FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving | May 23, 2025 | Autonomous DrivingImage Generation | —Unverified | 0 |
| One RL to See Them All: Visual Triple Unified Reinforcement Learning | May 23, 2025 | AllMath | —Unverified | 0 |
| RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs | May 22, 2025 | Image ManipulationMath | —Unverified | 0 |
| OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning | May 22, 2025 | Optical Character Recognition (OCR)Visual Reasoning | CodeCode Available | 0 |
| GRIT: Teaching MLLMs to Think with Images | May 21, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 |
| Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL | May 21, 2025 | 4kMultimodal Reasoning | —Unverified | 0 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 |
| Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning | May 21, 2025 | Reinforcement Learning (RL)Visual Reasoning | —Unverified | 0 |
| Visionary-R1: Mitigating Shortcuts in Visual Reasoning with Reinforcement Learning | May 20, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |
| Advancing Generalization Across a Variety of Abstract Visual Reasoning Tasks | May 19, 2025 | Visual Reasoning | —Unverified | 0 |
| ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models | May 19, 2025 | Chart Question AnsweringChart Understanding | —Unverified | 0 |
| ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models | May 19, 2025 | Visual Reasoning | CodeCode Available | 0 |
| RVTBench: A Benchmark for Visual Reasoning Tasks | May 17, 2025 | Reasoning SegmentationVisual Question Answering (VQA) | CodeCode Available | 0 |
| Human-Aligned Bench: Fine-Grained Assessment of Reasoning Ability in MLLMs vs. Humans | May 16, 2025 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 |
| Leveraging Vision-Language Models for Visual Grounding and Analysis of Automotive UI | May 9, 2025 | 4kDomain Generalization | CodeCode Available | 0 |
| VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making | May 6, 2025 | Decision MakingGeneral Knowledge | —Unverified | 0 |
| A Survey of Slow Thinking-based Reasoning LLMs using Reinforced Learning and Inference-time Scaling Law | May 5, 2025 | MathMedical Diagnosis | —Unverified | 0 |
| Localizing Before Answering: A Hallucination Evaluation Benchmark for Grounded Medical Multimodal LLMs | Apr 30, 2025 | HallucinationHallucination Evaluation | —Unverified | 0 |
| NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks | Apr 28, 2025 | Task PlanningVision-Language-Action | —Unverified | 0 |
| Doxing via the Lens: Revealing Location-related Privacy Leakage on Multi-modal Large Reasoning Models | Apr 27, 2025 | Visual ReasoningWorld Knowledge | —Unverified | 0 |
| A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task | Apr 24, 2025 | Question AnsweringRetrieval | —Unverified | 0 |
| LongPerceptualThoughts: Distilling System-2 Reasoning for System-1 Perception | Apr 21, 2025 | MathMMLU | —Unverified | 0 |
| VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models | Apr 21, 2025 | AttributeVisual Reasoning | —Unverified | 0 |
| Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | Apr 18, 2025 | MathVisual Reasoning | —Unverified | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Visual Language Models show widespread visual deficits on neuropsychological tests | Apr 15, 2025 | Object RecognitionVisual Reasoning | —Unverified | 0 |
| CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography | Apr 14, 2025 | BenchmarkingVisual Reasoning | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| TGraphX: Tensor-Aware Graph Neural Network for Multi-Dimensional Feature Learning | Apr 4, 2025 | Graph Neural Networkobject-detection | CodeCode Available | 0 |
| On Data Synthesis and Post-training for Visual Abstract Reasoning | Apr 2, 2025 | Visual Reasoning | —Unverified | 0 |
| TDBench: Benchmarking Vision-Language Models in Understanding Top-Down Images | Apr 1, 2025 | Autonomous NavigationBenchmarking | CodeCode Available | 0 |
| GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs | Mar 30, 2025 | Visual Reasoning | —Unverified | 0 |
| DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning | Mar 25, 2025 | Visual Reasoning | —Unverified | 0 |