| Leveraging LLMs for Mission Planning in Precision Agriculture | Jun 11, 2025 | Spatial Reasoning | —Unverified | 0 |
| A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing | Jun 10, 2025 | Spatial Reasoning | —Unverified | 0 |
| PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly | Jun 10, 2025 | Question AnsweringScene Understanding | —Unverified | 0 |
| Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning | Jun 5, 2025 | In-Context LearningIndoor Scene Synthesis | —Unverified | 0 |
| From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Jun 5, 2025 | 3D visual groundingObject | —Unverified | 0 |
| RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics | Jun 4, 2025 | Spatial Reasoning | —Unverified | 0 |
| SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing | Jun 4, 2025 | Spatial Reasoning | —Unverified | 0 |
| ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment | Jun 3, 2025 | Indoor Scene SynthesisObject | —Unverified | 0 |
| OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models | Jun 3, 2025 | Object CountingSpatial Reasoning | —Unverified | 0 |
| In-the-wild Audio Spatialization with Flexible Text-guided Localization | Jun 1, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors | May 30, 2025 | 3D geometryLarge Language Model | CodeCode Available | 0 |
| Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames | May 30, 2025 | ObjectSpatial Reasoning | —Unverified | 0 |
| Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces | May 30, 2025 | Spatial Reasoning | —Unverified | 0 |
| Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | May 29, 2025 | Spatial Reasoning | —Unverified | 0 |
| MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence | May 29, 2025 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models | May 27, 2025 | DiagnosticSpatial Reasoning | —Unverified | 0 |
| VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models | May 27, 2025 | Spatial ReasoningVisual Tracking | —Unverified | 0 |
| MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models | May 26, 2025 | Spatial Reasoning | —Unverified | 0 |
| Agentic 3D Scene Generation with Spatially Contextualized VLMs | May 26, 2025 | Multimodal ReasoningScene Generation | —Unverified | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | May 23, 2025 | 3D ReconstructionHand Pose Estimation | —Unverified | 0 |
| U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding | May 23, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation | May 22, 2025 | Motion GenerationObject | —Unverified | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 |
| DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? | May 22, 2025 | AttributeSpatial Reasoning | CodeCode Available | 0 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| SPaRC: A Spatial Pathfinding Reasoning Challenge | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks | May 22, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMs | May 21, 2025 | Efficient ExplorationReinforcement Learning (RL) | CodeCode Available | 0 |
| ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search | May 21, 2025 | Spatial Reasoning | —Unverified | 0 |
| SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution | May 21, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds | May 20, 2025 | Spatial Reasoning | —Unverified | 0 |
| From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning | May 20, 2025 | Spatial Reasoning | —Unverified | 0 |
| Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | May 19, 2025 | Multimodal ReasoningRobot Manipulation | —Unverified | 0 |
| SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning | May 18, 2025 | Knowledge DistillationSpatial Reasoning | —Unverified | 0 |
| Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind | May 18, 2025 | BenchmarkingScene Understanding | —Unverified | 0 |
| Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning? | May 17, 2025 | HallucinationObject Counting | —Unverified | 0 |
| PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging | May 17, 2025 | Image SegmentationLanguage Modeling | —Unverified | 0 |
| A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision | May 16, 2025 | Large Language ModelNavigate | —Unverified | 0 |
| SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models | May 8, 2025 | Spatial Reasoning | —Unverified | 0 |
| SITE: towards Spatial Intelligence Thorough Evaluation | May 8, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Preliminary Explorations with GPT-4o(mni) Native Image Generation | May 6, 2025 | Image Generationmultimodal generation | —Unverified | 0 |
| Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | May 3, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors | May 2, 2025 | ObjectSpatial Reasoning | —Unverified | 0 |
| SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models | May 1, 2025 | Spatial ReasoningVisual Question Answering (VQA) | —Unverified | 0 |
| First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images | Apr 30, 2025 | Spatial Reasoning | —Unverified | 0 |
| SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Apr 28, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 |