| VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions | Mar 11, 2020 | Human-Object Interaction DetectionObject | CodeCode Available | 1 | 5 |
| Warehouse Spatial Question Answering with LLM Agent | Jul 14, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations | Jun 5, 2025 | 4kSpatial Reasoning | CodeCode Available | 1 | 5 |
| Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | May 29, 2025 | Handwritten Mathmatical Expression RecognitionLanguage Modeling | CodeCode Available | 1 | 5 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 | 5 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 | 5 |
| Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities | Oct 22, 2024 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| IndoNLI: A Natural Language Inference Dataset for Indonesian | Oct 27, 2021 | Natural Language InferenceSentence | CodeCode Available | 1 | 5 |
| Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts | May 18, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| TopViewRS: Vision-Language Models as Top-View Spatial Reasoners | Jun 4, 2024 | Multiple-choiceSpatial Reasoning | CodeCode Available | 1 | 5 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D Reconstruction | Mar 3, 2022 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 1 | 5 |
| Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments | Nov 29, 2018 | PositionSpatial Reasoning | CodeCode Available | 1 | 5 |
| Translating Natural Language to Planning Goals with Large-Language Models | Feb 10, 2023 | Spatial ReasoningTranslation | CodeCode Available | 1 | 5 |
| Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization | Apr 25, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation | Jul 13, 2021 | Reinforcement Learning (RL)Spatial Reasoning | CodeCode Available | 1 | 5 |
| Are Deep Neural Networks SMARTer than Second Graders? | Dec 20, 2022 | Language ModellingMeta-Learning | CodeCode Available | 1 | 5 |
| Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection | Jan 1, 2021 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 | 5 |
| SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition | Aug 7, 2019 | BenchmarkingRelation | CodeCode Available | 1 | 5 |
| StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts | Apr 18, 2022 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Feb 5, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning | Jun 1, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 | 5 |
| End-to-End Egospheric Spatial Memory | Feb 15, 2021 | General Reinforcement LearningImitation Learning | CodeCode Available | 1 | 5 |
| Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning | Oct 5, 2023 | NavigateSpatial Reasoning | CodeCode Available | 1 | 5 |
| Geospatial Mechanistic Interpretability of Large Language Models | May 6, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | May 26, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 | 5 |
| SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Mar 31, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning | Apr 12, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | May 30, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 | 5 |
| Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay | Jul 12, 2024 | Spatial Reasoning | CodeCode Available | 0 | 5 |
| Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning | May 22, 2025 | Spatial Reasoning | CodeCode Available | 0 | 5 |
| Scaling RL to Long Videos | Jul 10, 2025 | Reinforcement Learning (RL)Spatial Reasoning | CodeCode Available | 0 | 5 |
| SORNet: Spatial Object-Centric Representations for Sequential Manipulation | Sep 8, 2021 | ObjectRelation Classification | CodeCode Available | 0 | 5 |
| EgoHumans: An Egocentric 3D Multi-Human Benchmark | May 25, 2023 | 3D Pose EstimationHuman Detection | CodeCode Available | 0 | 5 |
| Representation Learning for Grounded Spatial Reasoning | Jul 13, 2017 | reinforcement-learningReinforcement Learning | CodeCode Available | 0 | 5 |
| Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning | May 23, 2024 | Logical Reasoning Question AnsweringSpatial Reasoning | CodeCode Available | 0 | 5 |
| OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | Jul 10, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 0 | 5 |
| Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning | Oct 25, 2023 | Spatial Reasoning | CodeCode Available | 0 | 5 |
| Neuro-symbolic Training for Reasoning over Spatial Language | Jun 19, 2024 | Spatial ReasoningTransfer Learning | CodeCode Available | 0 | 5 |
| DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? | May 22, 2025 | AttributeSpatial Reasoning | CodeCode Available | 0 | 5 |
| No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & LiDARs | Feb 23, 2018 | Autonomous VehiclesMulti-Object Tracking | CodeCode Available | 0 | 5 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 | 5 |
| DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text | Oct 19, 2023 | Graph Neural NetworkSpatial Reasoning | CodeCode Available | 0 | 5 |
| Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation | Sep 20, 2023 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 0 | 5 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 | 5 |