| Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces | May 30, 2025 | Spatial Reasoning | —Unverified | 0 |
| Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames | May 30, 2025 | ObjectSpatial Reasoning | —Unverified | 0 |
| Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors | May 30, 2025 | 3D geometryLarge Language Model | CodeCode Available | 0 |
| VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | May 30, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 |
| Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | May 29, 2025 | Spatial Reasoning | —Unverified | 0 |
| ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | May 29, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | May 29, 2025 | Handwritten Mathmatical Expression RecognitionLanguage Modeling | CodeCode Available | 1 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence | May 29, 2025 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |