| VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models | May 27, 2025 | Spatial ReasoningVisual Tracking | —Unverified | 0 | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning | Feb 2, 2025 | Spatial ReasoningVision-Language Navigation | —Unverified | 0 | 0 |
| What is needed for simple spatial language capabilities in VQA? | Aug 17, 2019 | DiagnosticQuestion Answering | —Unverified | 0 | 0 |
| Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Oct 24, 2024 | Novel View SynthesisPose Estimation | —Unverified | 0 | 0 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 | 0 |
| WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | Jun 16, 2024 | BenchmarkingSpatial Reasoning | —Unverified | 0 | 0 |
| World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Jun 26, 2025 | Imitation LearningLanguage Modeling | —Unverified | 0 | 0 |
| Perturbed State Space Feature Encoders for Optical Flow with Event Cameras | Apr 14, 2025 | Event-based Optical FlowOptical Flow Estimation | —Unverified | 0 | 0 |
| REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Aug 5, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 | 0 |
| A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision | May 16, 2025 | Large Language ModelNavigate | —Unverified | 0 | 0 |
| Leveraging LLMs for Mission Planning in Precision Agriculture | Jun 11, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Jan 28, 2025 | Instruction FollowingMixture-of-Experts | —Unverified | 0 | 0 |
| 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark | Dec 10, 2024 | Autonomous NavigationSpatial Reasoning | —Unverified | 0 | 0 |
| A Call for New Recipes to Enhance Spatial Reasoning in MLLMs | Apr 21, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching | Sep 6, 2024 | Action GenerationSpatial Reasoning | —Unverified | 0 | 0 |
| Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications | Aug 12, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 | 0 |
| A dual contrastive framework | Dec 13, 2024 | Contrastive LearningDecoder | —Unverified | 0 | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations | Apr 10, 2025 | Spatial ReasoningVisual Grounding | —Unverified | 0 | 0 |
| Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Oct 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 | 0 |
| AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models | Aug 28, 2024 | Spatial ReasoningTask Planning | —Unverified | 0 | 0 |
| Aether: Geometric-Aware Unified World Modeling | Mar 24, 2025 | Dynamic ReconstructionPrediction | —Unverified | 0 | 0 |
| Agentic 3D Scene Generation with Spatially Contextualized VLMs | May 26, 2025 | Multimodal ReasoningScene Generation | —Unverified | 0 | 0 |
| AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality | Nov 9, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |