| Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 | 5 |
| Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence | May 29, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models | May 8, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| Spatial-RAG: Spatial Retrieval Augmented Generation for Real-World Geospatial Reasoning Questions | Feb 4, 2025 | Question AnsweringRAG | —Unverified | 0 | 0 |
| Spatial Reasoner: A 3D Inference Pipeline for XR Applications | Apr 25, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning | Apr 28, 2025 | Question AnsweringSpatial Reasoning | —Unverified | 0 | 0 |
| Spatial Reasoning and Planning for Deep Embodied Agents | Sep 28, 2024 | Autonomous DrivingMinecraft | —Unverified | 0 | 0 |
| Spatial Reasoning for Few-Shot Object Detection | Nov 2, 2022 | Data AugmentationFew-Shot Object Detection | —Unverified | 0 | 0 |
| Spatial Reasoning from Natural Language Instructions for Robot Manipulation | Dec 26, 2020 | Robot ManipulationSpatial Reasoning | —Unverified | 0 | 0 |
| Spatial Symmetry Driven Pruning Strategies for Efficient Declarative Spatial Reasoning | Jun 16, 2015 | Spatial Reasoning | —Unverified | 0 | 0 |
| SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | Jan 22, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 | 0 |
| SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning | May 18, 2025 | Knowledge DistillationSpatial Reasoning | —Unverified | 0 | 0 |
| Stacked Latent Attention for Multimodal Reasoning | Jun 1, 2018 | Image CaptioningMultimodal Reasoning | —Unverified | 0 | 0 |
| StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments | Jan 9, 2024 | ImputationReinforcement Learning (RL) | —Unverified | 0 | 0 |
| Statistical applications of the 20/60/20 rule in risk management and portfolio optimization | Mar 19, 2025 | ManagementPortfolio Optimization | —Unverified | 0 | 0 |
| STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning | Feb 14, 2025 | Decision MakingSpatial Reasoning | —Unverified | 0 | 0 |
| Stride and Translation Invariance in CNNs | Mar 18, 2021 | Data Augmentationimage-classification | —Unverified | 0 | 0 |
| Structured Spatial Reasoning with Open Vocabulary Object Detectors | Oct 9, 2024 | ObjectObject Rearrangement | —Unverified | 0 | 0 |
| ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models | Mar 25, 2025 | 4D reconstructionAutonomous Driving | —Unverified | 0 | 0 |
| Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models | Sep 23, 2024 | Common Sense ReasoningSpatial Reasoning | —Unverified | 0 | 0 |
| Talking about the Moving Image: A Declarative Model for Image Schema Based Embodied Perception Grounding and Language Generation | Aug 13, 2015 | Spatial ReasoningText Generation | —Unverified | 0 | 0 |
| Testing GPT-4-o1-preview on math and science problems: A follow-up study | Oct 11, 2024 | MathSpatial Reasoning | —Unverified | 0 | 0 |
| TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation | Nov 25, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |
| Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering | Sep 21, 2022 | Image CaptioningOptical Character Recognition (OCR) | —Unverified | 0 | 0 |
| Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery | May 23, 2025 | 3D ReconstructionHand Pose Estimation | —Unverified | 0 | 0 |
| Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds | May 20, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 | 0 |
| Towards Navigation by Reasoning over Spatial Configurations | May 14, 2021 | Spatial Reasoning | —Unverified | 0 | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 | 0 |
| U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding | May 23, 2025 | BenchmarkingSpatial Reasoning | —Unverified | 0 | 0 |
| UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Mar 19, 2025 | NavigateSpatial Reasoning | —Unverified | 0 | 0 |
| Unifying Map and Landmark Based Representations for Visual Navigation | Dec 21, 2017 | NavigateSpatial Reasoning | —Unverified | 0 | 0 |
| Unsupervised Representation Learning Facilitates Human-like Spatial Reasoning | Oct 12, 2021 | Representation LearningSpatial Reasoning | —Unverified | 0 | 0 |
| Video Perception Models for 3D Scene Synthesis | Jun 25, 2025 | 3D ReconstructionImage Generation | —Unverified | 0 | 0 |
| VideoSAVi: Self-Aligned Video Language Models without Human Supervision | Dec 1, 2024 | EgoSchemaMVBench | —Unverified | 0 | 0 |
| VisionArena: 230K Real World User-VLM Conversations with Preference Labels | Dec 11, 2024 | ChatbotSpatial Reasoning | —Unverified | 0 | 0 |
| Vision-Integrated LLMs for Autonomous Driving Assistance : Human Performance Comparison and Trust Evaluation | Feb 6, 2025 | Autonomous DrivingDecision Making | —Unverified | 0 | 0 |
| Visual Agentic AI for Spatial Reasoning with a Dynamic API | Feb 10, 2025 | Program SynthesisSpatial Reasoning | —Unverified | 0 | 0 |
| VisualEchoes: Spatial Image Representation Learning through Echolocation | May 4, 2020 | Depth EstimationMonocular Depth Estimation | —Unverified | 0 | 0 |
| Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces | May 30, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Nov 15, 2024 | DescriptiveObject | —Unverified | 0 | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 | 0 |
| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers | May 26, 2025 | cross-modal alignmentPosition | —Unverified | 0 | 0 |
| VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models | May 27, 2025 | Spatial ReasoningVisual Tracking | —Unverified | 0 | 0 |
| VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought | May 22, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| VL-Nav: Real-time Vision-Language Navigation with Spatial Reasoning | Feb 2, 2025 | Spatial ReasoningVision-Language Navigation | —Unverified | 0 | 0 |
| What is needed for simple spatial language capabilities in VQA? | Aug 17, 2019 | DiagnosticQuestion Answering | —Unverified | 0 | 0 |
| Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Oct 24, 2024 | Novel View SynthesisPose Estimation | —Unverified | 0 | 0 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 | 0 |