| AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Feb 20, 2025 | Autonomous NavigationNavigate | CodeCode Available | 2 | 5 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 | 5 |
| Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes | Aug 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning | Apr 12, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Mar 31, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning | Jun 1, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues | Oct 20, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs | Feb 5, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | Jan 8, 2024 | Relation MappingSpatial Reasoning | CodeCode Available | 1 | 5 |
| Knot So Simple: A Minimalistic Environment for Spatial Reasoning | May 23, 2025 | Model Predictive ControlSpatial Reasoning | CodeCode Available | 1 | 5 |
| Learning and Reasoning with the Graph Structure Representation in Robotic Surgery | Jul 7, 2020 | Edge ClassificationGraph Generation | CodeCode Available | 1 | 5 |
| Learning Action and Reasoning-Centric Image Editing from Videos and Simulations | Jul 3, 2024 | AttributeSpatial Reasoning | CodeCode Available | 1 | 5 |
| Joint Spatio-Textual Reasoning for Answering Tourism Questions | Sep 28, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Nov 9, 2024 | object-detectionObject Detection | CodeCode Available | 1 | 5 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 | 5 |
| IndoNLI: A Natural Language Inference Dataset for Indonesian | Oct 27, 2021 | Natural Language InferenceSentence | CodeCode Available | 1 | 5 |
| A Universal Semantic-Geometric Representation for Robotic Manipulation | Jun 18, 2023 | 3D geometryRobot Manipulation | CodeCode Available | 1 | 5 |
| SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting | Apr 25, 2020 | Geographic Question AnsweringGraph Embedding | CodeCode Available | 1 | 5 |
| HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation | Jan 16, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 1 | 5 |
| DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding | May 10, 2024 | RelationSpatial Reasoning | CodeCode Available | 1 | 5 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 | 5 |
| LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | Feb 26, 2024 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 | 5 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 | 5 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 | 5 |
| 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation | Jun 11, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| Revisiting spatio-temporal layouts for compositional action recognition | Nov 2, 2021 | Action ClassificationAction Detection | CodeCode Available | 1 | 5 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 | 5 |
| AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Jun 19, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 | 5 |
| GuessWhat?! Visual object discovery through multi-modal dialogue | Nov 23, 2016 | ObjectObject Discovery | CodeCode Available | 1 | 5 |
| SBEVNet: End-to-End Deep Stereo Layout Estimation | May 25, 2021 | Depth EstimationDisparity Estimation | CodeCode Available | 1 | 5 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 | 5 |
| Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models | Feb 12, 2025 | AttributeDiagnostic | CodeCode Available | 1 | 5 |
| CLIPort: What and Where Pathways for Robotic Manipulation | Sep 24, 2021 | Imitation LearningRobotic Grasping | CodeCode Available | 1 | 5 |
| Geospatial Mechanistic Interpretability of Large Language Models | May 6, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 | 5 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 | 5 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 | 5 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| Grounded Chain-of-Thought for Multimodal Large Language Models | Mar 17, 2025 | HallucinationSpatial Reasoning | CodeCode Available | 1 | 5 |
| Self-supervised Spatial Reasoning on Multi-View Line Drawings | Apr 27, 2021 | Binary ClassificationContrastive Learning | CodeCode Available | 1 | 5 |
| Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection | Jan 1, 2021 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 | 5 |
| Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Feb 5, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 | 5 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation | Mar 23, 2020 | DecoderSpatial Reasoning | CodeCode Available | 1 | 5 |
| Decoding Language Spatial Relations to 2D Spatial Arrangements | Nov 1, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 | 5 |