| Proof of Thought : Neurosymbolic Program Synthesis allows Robust and Interpretable Reasoning | Sep 25, 2024 | BenchmarkingFormal Logic | —Unverified | 0 | 0 |
| PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | May 17, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 | 0 |
| An X-Ray Is Worth 15 Features: Sparse Autoencoders for Interpretable Radiology Report Generation | Oct 4, 2024 | Language ModellingMultimodal Reasoning | —Unverified | 0 | 0 |
| Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs | May 7, 2025 | Electrocardiography (ECG)Language Modeling | —Unverified | 0 | 0 |
| VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | Mar 13, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| Question Aware Vision Transformer for Multimodal Reasoning | Feb 8, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark | Feb 24, 2025 | AllMultimodal Reasoning | —Unverified | 0 | 0 |
| Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework | Apr 1, 2025 | Decision MakingIn-Context Learning | —Unverified | 0 | 0 |
| Agentic 3D Scene Generation with Spatially Contextualized VLMs | May 26, 2025 | Multimodal ReasoningScene Generation | —Unverified | 0 | 0 |
| RadFabric: Agentic AI System with Reasoning Capability for Radiology | Jun 17, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 | 0 |
| R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation | May 4, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| Reducing the Vision and Language Bias for Temporal Sentence Grounding | Jul 27, 2022 | Information RetrievalMultimodal Reasoning | —Unverified | 0 | 0 |
| Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models | Apr 30, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| A Generative Framework for Bidirectional Image-Report Understanding in Chest Radiography | Feb 9, 2025 | DiagnosticMultimodal Reasoning | —Unverified | 0 | 0 |
| Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning | May 31, 2024 | Answer GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark | Jul 17, 2025 | Multimodal ReasoningPose Estimation | —Unverified | 0 | 0 |
| RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis | Feb 25, 2024 | Code GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| RSVP: Reasoning Segmentation via Visual Prompting and Multi-modal Chain-of-Thought | Jun 4, 2025 | Multimodal ReasoningReasoning Segmentation | —Unverified | 0 | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning | May 28, 2025 | Image SegmentationMultimodal Reasoning | —Unverified | 0 | 0 |
| VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs | Mar 25, 2025 | DiversityMultimodal Reasoning | —Unverified | 0 | 0 |
| Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning | Jun 12, 2025 | AttributeMultimodal Reasoning | —Unverified | 0 | 0 |
| Seed1.5-VL Technical Report | May 11, 2025 | Mixture-of-ExpertsMultimodal Reasoning | —Unverified | 0 | 0 |
| Seeing and Reasoning with Confidence: Supercharging Multimodal LLMs with an Uncertainty-Aware Agentic Framework | Mar 11, 2025 | Conformal PredictionMultimodal Reasoning | —Unverified | 0 | 0 |
| Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | Feb 24, 2025 | document understandingMultimodal Reasoning | —Unverified | 0 | 0 |
| VisualSphinx: Large-Scale Synthetic Vision Logic Puzzles for RL | May 29, 2025 | Arithmetic ReasoningImage Generation | —Unverified | 0 | 0 |
| Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning | Jun 4, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Advancing Conversational Diagnostic AI with Multimodal Reasoning | May 6, 2025 | DiagnosticManagement | —Unverified | 0 | 0 |
| Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning | May 12, 2025 | Multimodal Reasoning | —Unverified | 0 | 0 |
| SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Apr 14, 2025 | Anomaly DetectionDomain Adaptation | —Unverified | 0 | 0 |
| Sound2Sight: Generating Visual Dynamics from Sound and Context | Jul 23, 2020 | Multimodal ReasoningVideo Forecasting | —Unverified | 0 | 0 |
| SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning | Jun 2, 2025 | Multimodal Reasoningreinforcement-learning | —Unverified | 0 | 0 |
| Stacked Latent Attention for Multimodal Reasoning | Jun 1, 2018 | Image CaptioningMultimodal Reasoning | —Unverified | 0 | 0 |
| SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | May 7, 2025 | DiversityMixture-of-Experts | —Unverified | 0 | 0 |
| VL-GenRM: Enhancing Vision-Language Verification via Vision Experts and Iterative Training | Jun 16, 2025 | HallucinationMultimodal Reasoning | —Unverified | 0 | 0 |
| Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval | Mar 26, 2024 | Multimodal ReasoningRetrieval | —Unverified | 0 | 0 |
| AdaToken-3D: Dynamic Spatial Gating for Efficient 3D Large Multimodal-Models Reasoning | May 19, 2025 | Multimodal ReasoningScene Understanding | —Unverified | 0 | 0 |
| The Synergy Dilemma of Long-CoT SFT and RL: Investigating Post-Training Techniques for Reasoning VLMs | Jul 10, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 | 0 |
| VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization | Apr 17, 2025 | Multimodal ReasoningSafety Alignment | —Unverified | 0 | 0 |
| Diving into Self-Evolving Training for Multimodal Reasoning | Dec 23, 2024 | Multimodal Reasoning | —Unverified | 0 | 0 |
| DOC2PPT: Automatic Presentation Slides Generation from Scientific Documents | Jan 28, 2021 | Document SummarizationMultimodal Reasoning | —Unverified | 0 | 0 |
| DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation | May 25, 2022 | Multimodal ReasoningOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Adapting Vision-Language Models for Evaluating World Models | Jun 22, 2025 | Action RecognitionMultimodal Reasoning | —Unverified | 0 | 0 |
| Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation | May 24, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation | Apr 13, 2025 | Code GenerationMultimodal Reasoning | —Unverified | 0 | 0 |
| DreamPRM: Domain-Reweighted Process Reward Model for Multimodal Reasoning | May 26, 2025 | Meta-LearningMultimodal Reasoning | —Unverified | 0 | 0 |
| Deep Neural Networks for Visual Reasoning | Sep 24, 2022 | Multimodal ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| DRIVINGVQA: Analyzing Visual Chain-of-Thought Reasoning of Vision Language Models in Real-World Scenarios with Driving Theory Tests | Jan 8, 2025 | Multimodal ReasoningMultiple-choice | —Unverified | 0 | 0 |
| EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Mar 19, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |
| EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models | Jan 1, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 | 0 |