| Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | May 23, 2025 | 3D ReconstructionHand Pose Estimation | —Unverified | 0 |
| Knot So Simple: A Minimalistic Environment for Spatial Reasoning | May 23, 2025 | Model Predictive ControlSpatial Reasoning | CodeCode Available | 1 |
| Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks | May 22, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| SPaRC: A Spatial Pathfinding Reasoning Challenge | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 |
| DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? | May 22, 2025 | AttributeSpatial Reasoning | CodeCode Available | 0 |
| MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation | May 22, 2025 | Motion GenerationObject | —Unverified | 0 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 |
| SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding | May 22, 2025 | Motion EstimationQuestion Answering | CodeCode Available | 2 |
| SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution | May 21, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search | May 21, 2025 | Spatial Reasoning | —Unverified | 0 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 |
| From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning | May 20, 2025 | Spatial Reasoning | —Unverified | 0 |
| Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds | May 20, 2025 | Spatial Reasoning | —Unverified | 0 |
| Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | May 19, 2025 | Multimodal ReasoningRobot Manipulation | —Unverified | 0 |
| SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning | May 18, 2025 | Knowledge DistillationSpatial Reasoning | —Unverified | 0 |
| Visuospatial Cognitive Assistant | May 18, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts | May 18, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind | May 18, 2025 | BenchmarkingScene Understanding | —Unverified | 0 |
| PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | May 17, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 |
| Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? | May 17, 2025 | HallucinationObject Counting | —Unverified | 0 |
| A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision | May 16, 2025 | Large Language ModelNavigate | —Unverified | 0 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 |
| Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities | May 10, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 |
| SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models | May 8, 2025 | Spatial Reasoning | —Unverified | 0 |
| SITE: towards Spatial Intelligence Thorough Evaluation | May 8, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Preliminary Explorations with GPT-4o(mni) Native Image Generation | May 6, 2025 | Image Generationmultimodal generation | —Unverified | 0 |
| Geospatial Mechanistic Interpretability of Large Language Models | May 6, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | May 3, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors | May 2, 2025 | ObjectSpatial Reasoning | —Unverified | 0 |
| SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models | May 1, 2025 | Spatial ReasoningVisual Question Answering (VQA) | —Unverified | 0 |
| First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images | Apr 30, 2025 | Spatial Reasoning | —Unverified | 0 |
| SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Apr 28, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization | Apr 25, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| A Review of 3D Object Detection with Vision-Language Models | Apr 25, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models | Apr 25, 2025 | Spatial ReasoningText to 3D | CodeCode Available | 2 |
| Spatial Reasoner: A 3D Inference Pipeline for XR Applications | Apr 25, 2025 | Spatial Reasoning | —Unverified | 0 |
| A Call for New Recipes to Enhance Spatial Reasoning in MLLMs | Apr 21, 2025 | Spatial Reasoning | —Unverified | 0 |
| InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | Apr 19, 2025 | Action GenerationLogical Reasoning | CodeCode Available | 2 |
| Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement Learning | Apr 17, 2025 | Multimodal ReasoningReinforcement Learning (RL) | CodeCode Available | 2 |
| EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery | Apr 17, 2025 | Large Language ModelMulti-Task Learning | —Unverified | 0 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| Intelligence of Things: A Spatial Context-Aware Control System for Smart Devices | Apr 16, 2025 | Spatial Reasoning | —Unverified | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Embodied World Models Emerge from Navigational Task in Open-Ended Environments | Apr 15, 2025 | Meta Reinforcement LearningSpatial Reasoning | —Unverified | 0 |
| A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |