| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 |
| VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models | May 27, 2025 | Spatial ReasoningVisual Tracking | —Unverified | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 |
| VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning | Feb 2, 2025 | Spatial ReasoningVision-Language Navigation | —Unverified | 0 |
| What is needed for simple spatial language capabilities in VQA? | Aug 17, 2019 | DiagnosticQuestion Answering | —Unverified | 0 |
| Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Oct 24, 2024 | Novel View SynthesisPose Estimation | —Unverified | 0 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | Jun 16, 2024 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model | Aug 1, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| SEM: Enhancing Spatial Understanding for Robust Robot Manipulation | May 22, 2025 | 3D geometryRobot Manipulation | —Unverified | 0 |
| ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Jun 26, 2025 | Spatial ReasoningVideo Generation | —Unverified | 0 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| SITE: towards Spatial Intelligence Thorough Evaluation | May 8, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Situational Grounding within Multimodal Simulations | Feb 5, 2019 | Novel ConceptsSpatial Reasoning | —Unverified | 0 |
| SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs | Jan 1, 2025 | Contrastive LearningImage Generation | —Unverified | 0 |
| Skeleton-Based Action Recognition with Spatial Reasoning and Temporal Stack Learning | May 7, 2018 | Action RecognitionGraph Neural Network | —Unverified | 0 |
| Representation Learning for Grounded Spatial Reasoning | Jul 13, 2017 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 |
| Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning | May 23, 2024 | Logical Reasoning Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| In-the-wild Audio Spatialization with Flexible Text-guided Localization | Jun 1, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay | Jul 12, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| Inherent limitations of LLMs regarding spatial information | Dec 5, 2023 | Spatial Reasoning | CodeCode Available | 0 |
| A Trajectory Calculus for Qualitative Spatial Reasoning Using Answer Set Programming | Apr 19, 2018 | Spatial Reasoning | CodeCode Available | 0 |
| ImplicitQA: Going beyond frames towards Implicit Video Reasoning | Jun 26, 2025 | Spatial Reasoning | CodeCode Available | 0 |