| SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | May 7, 2025 | DiversityMixture-of-Experts | —Unverified | 0 |
| X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains | May 6, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Advancing Conversational Diagnostic AI with Multimodal Reasoning | May 6, 2025 | DiagnosticManagement | —Unverified | 0 |
| R-Bench: Graduate-level Multi-disciplinary Benchmarks for LLM & MLLM Complex Reasoning Evaluation | May 4, 2025 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models | Apr 30, 2025 | Multimodal ReasoningReinforcement Learning (RL) | —Unverified | 0 |
| MultiMind: Enhancing Werewolf Agents with Multimodal Reasoning and Theory of Mind | Apr 25, 2025 | Large Language ModelMultimodal Reasoning | —Unverified | 0 |
| VideoMultiAgents: A Multi-Agent Framework for Video Question Answering | Apr 25, 2025 | Caption GenerationEgoSchema | CodeCode Available | 1 |
| Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning | Apr 23, 2025 | Multimodal Reasoningreinforcement-learning | CodeCode Available | 7 |
| VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization | Apr 17, 2025 | Multimodal ReasoningSafety Alignment | —Unverified | 0 |
| GeoSense: Evaluating Identification and Application of Geometric Principles in Multimodal Reasoning | Apr 17, 2025 | Geometry Problem SolvingMultimodal Reasoning | —Unverified | 0 |
| Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning | Apr 17, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics | Apr 14, 2025 | Knowledge GraphsMultimodal Reasoning | —Unverified | 0 |
| SlowFastVAD: Video Anomaly Detection via Integrating Simple Detector and RAG-Enhanced Vision-Language Model | Apr 14, 2025 | Anomaly DetectionDomain Adaptation | —Unverified | 0 |
| Breaking the Data Barrier -- Building GUI Agents Through Task Generalization | Apr 14, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation | Apr 13, 2025 | Code GenerationMultimodal Reasoning | —Unverified | 0 |
| HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | Apr 13, 2025 | Multimodal ReasoningRAG | CodeCode Available | 2 |
| NoTeS-Bank: Benchmarking Neural Transcription and Search for Scientific Notes Understanding | Apr 12, 2025 | BenchmarkingDocument AI | —Unverified | 0 |
| VLMT: Vision-Language Multimodal Transformer for Multimodal Multi-hop Question Answering | Apr 11, 2025 | cross-modal alignmentInformation Retrieval | —Unverified | 0 |
| VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | Apr 10, 2025 | MathMultimodal Reasoning | CodeCode Available | 2 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |
| MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | Apr 8, 2025 | MathMultimodal Reasoning | CodeCode Available | 1 |
| Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought | Apr 8, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 7 |
| Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) | Apr 4, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | Apr 4, 2025 | BenchmarkingImage Generation | —Unverified | 0 |
| Affordable AI Assistants with Knowledge Graph of Thoughts | Apr 3, 2025 | Knowledge GraphsLLM real-life tasks | CodeCode Available | 3 |
| FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning | Apr 1, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 2 |
| Agentic Multimodal AI for Hyperpersonalized B2B and B2C Advertising in Competitive Markets: An AI-Driven Competitive Advertising Framework | Apr 1, 2025 | Decision MakingIn-Context Learning | —Unverified | 0 |
| Boosting MLLM Reasoning with Text-Debiased Hint-GRPO | Mar 31, 2025 | Mathematical ReasoningMultimodal Reasoning | CodeCode Available | 1 |
| Evolutionary Prompt Optimization Discovers Emergent Multimodal Reasoning Strategies in Vision-Language Models | Mar 30, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 |
| 3MDBench: Medical Multimodal Multi-agent Dialogue Benchmark | Mar 26, 2025 | DiagnosticMultimodal Reasoning | CodeCode Available | 1 |
| VisualQuest: A Diverse Image Dataset for Evaluating Visual Recognition in LLMs | Mar 25, 2025 | DiversityMultimodal Reasoning | —Unverified | 0 |
| Training-Free Personalization via Retrieval and Reasoning on Fingerprints | Mar 24, 2025 | AttributeMultimodal Reasoning | —Unverified | 0 |
| Mind with Eyes: from Language Reasoning to Multimodal Reasoning | Mar 23, 2025 | Action GenerationMultimodal Reasoning | —Unverified | 0 |
| OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | Mar 21, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models | Mar 20, 2025 | Multimodal ReasoningRecommendation Systems | —Unverified | 0 |
| EfficientLLaVA:Generalizable Auto-Pruning for Large Vision-language Models | Mar 19, 2025 | MM-VetMultimodal Reasoning | —Unverified | 0 |
| LLaVA-MORE: A Comparative Study of LLMs and Visual Backbones for Enhanced Visual Instruction Tuning | Mar 19, 2025 | Instruction FollowingMultimodal Reasoning | CodeCode Available | 2 |
| Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | Mar 17, 2025 | Mathematical ReasoningMultimodal Reasoning | —Unverified | 0 |
| MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research | Mar 17, 2025 | ArticlesBenchmarking | CodeCode Available | 1 |
| DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding | Mar 17, 2025 | Domain GeneralizationMultimodal Reasoning | CodeCode Available | 2 |
| MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification | Mar 16, 2025 | Multimodal Reasoning | —Unverified | 0 |
| Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition | Mar 16, 2025 | Caption GenerationImage Captioning | CodeCode Available | 1 |
| VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | Mar 14, 2025 | BenchmarkingDecision Making | —Unverified | 0 |
| Chat-TS: Enhancing Multi-Modal Reasoning Over Time-Series and Natural Language Data | Mar 13, 2025 | Large Language ModelMath | —Unverified | 0 |
| How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game | Mar 13, 2025 | Multimodal ReasoningQuestion Answering | CodeCode Available | 1 |
| R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | Mar 13, 2025 | Multimodal Reasoning | CodeCode Available | 4 |
| VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | Mar 13, 2025 | Multimodal Reasoning | —Unverified | 0 |
| MindGYM: Enhancing Vision-Language Models via Synthetic Self-Challenging Questions | Mar 12, 2025 | Computational EfficiencyMultimodal Reasoning | CodeCode Available | 0 |
| Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis | Mar 11, 2025 | AllDataset Generation | CodeCode Available | 1 |