| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 |
| Learning and Reasoning with the Graph Structure Representation in Robotic Surgery | Jul 7, 2020 | Edge ClassificationGraph Generation | CodeCode Available | 1 |
| SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting | Apr 25, 2020 | Geographic Question AnsweringGraph Embedding | CodeCode Available | 1 |
| SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Mar 31, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation | Mar 23, 2020 | DecoderSpatial Reasoning | CodeCode Available | 1 |
| VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions | Mar 11, 2020 | Human-Object Interaction DetectionObject | CodeCode Available | 1 |
| SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition | Aug 7, 2019 | BenchmarkingRelation | CodeCode Available | 1 |
| Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments | Nov 29, 2018 | PositionSpatial Reasoning | CodeCode Available | 1 |
| GuessWhat?! Visual object discovery through multi-modal dialogue | Nov 23, 2016 | ObjectObject Discovery | CodeCode Available | 1 |
| MindJourney: Test-Time Scaling with World Models for Spatial Reasoning | Jul 16, 2025 | Spatial Reasoning | —Unverified | 0 |
| EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Jul 14, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Jul 11, 2025 | Depth EstimationHallucination | —Unverified | 0 |
| M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning | Jul 11, 2025 | Spatial Reasoning | —Unverified | 0 |
| Scaling RL to Long Videos | Jul 10, 2025 | Reinforcement Learning (RL)Spatial Reasoning | —Unverified | 0 |
| OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | Jul 10, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 0 |
| A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Jul 9, 2025 | 3D visual groundingAutonomous Navigation | —Unverified | 0 |
| Optimising Language Models for Downstream Tasks: A Post-Training Perspective | Jun 26, 2025 | parameter-efficient fine-tuningSpatial Reasoning | —Unverified | 0 |
| ImplicitQA: Going beyond frames towards Implicit Video Reasoning | Jun 26, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Jun 26, 2025 | Imitation LearningLanguage Modeling | —Unverified | 0 |
| ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Jun 26, 2025 | Spatial ReasoningVideo Generation | —Unverified | 0 |
| From 2D to 3D Cognition: A Brief Survey of General World Models | Jun 25, 2025 | Autonomous DrivingScene Generation | —Unverified | 0 |
| Video Perception Models for 3D Scene Synthesis | Jun 25, 2025 | 3D ReconstructionImage Generation | —Unverified | 0 |
| ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies | Jun 17, 2025 | Scene GenerationSpatial Reasoning | —Unverified | 0 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 |