| Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving | Oct 3, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 |
| DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions | Sep 7, 2023 | PositionSpatial Reasoning | CodeCode Available | 1 |
| A Universal Semantic-Geometric Representation for Robotic Manipulation | Jun 18, 2023 | 3D geometryRobot Manipulation | CodeCode Available | 1 |
| Translating Natural Language to Planning Goals with Large-Language Models | Feb 10, 2023 | Spatial ReasoningTranslation | CodeCode Available | 1 |
| Are Deep Neural Networks SMARTer than Second Graders? | Dec 20, 2022 | Language ModellingMeta-Learning | CodeCode Available | 1 |
| Visual Spatial Reasoning | Apr 30, 2022 | Spatial Reasoning | CodeCode Available | 1 |
| StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in Texts | Apr 18, 2022 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D Reconstruction | Mar 3, 2022 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 1 |
| Revisiting spatio-temporal layouts for compositional action recognition | Nov 2, 2021 | Action ClassificationAction Detection | CodeCode Available | 1 |
| IndoNLI: A Natural Language Inference Dataset for Indonesian | Oct 27, 2021 | Natural Language InferenceSentence | CodeCode Available | 1 |
| CLIPort: What and Where Pathways for Robotic Manipulation | Sep 24, 2021 | Imitation LearningRobotic Grasping | CodeCode Available | 1 |
| Teaching Agents how to Map: Spatial Reasoning for Multi-Object Navigation | Jul 13, 2021 | Reinforcement Learning (RL)Spatial Reasoning | CodeCode Available | 1 |
| SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning | Jun 1, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| SBEVNet: End-to-End Deep Stereo Layout Estimation | May 25, 2021 | Depth EstimationDisparity Estimation | CodeCode Available | 1 |
| Self-supervised Spatial Reasoning on Multi-View Line Drawings | Apr 27, 2021 | Binary ClassificationContrastive Learning | CodeCode Available | 1 |
| SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning | Apr 12, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| End-to-End Egospheric Spatial Memory | Feb 15, 2021 | General Reinforcement LearningImitation Learning | CodeCode Available | 1 |
| Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images | Feb 9, 2021 | DecoderMedical Image Segmentation | CodeCode Available | 1 |
| Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection | Jan 1, 2021 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 |
| Long Range Arena: A Benchmark for Efficient Transformers | Nov 8, 2020 | 16kBenchmarking | CodeCode Available | 1 |
| Decoding Language Spatial Relations to 2D Spatial Arrangements | Nov 1, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues | Oct 20, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| Joint Spatio-Textual Reasoning for Answering Tourism Questions | Sep 28, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 |
| Learning and Reasoning with the Graph Structure Representation in Robotic Surgery | Jul 7, 2020 | Edge ClassificationGraph Generation | CodeCode Available | 1 |
| SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting | Apr 25, 2020 | Geographic Question AnsweringGraph Embedding | CodeCode Available | 1 |
| SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Mar 31, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation | Mar 23, 2020 | DecoderSpatial Reasoning | CodeCode Available | 1 |
| VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph Convolutions | Mar 11, 2020 | Human-Object Interaction DetectionObject | CodeCode Available | 1 |
| SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition | Aug 7, 2019 | BenchmarkingRelation | CodeCode Available | 1 |
| Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street Environments | Nov 29, 2018 | PositionSpatial Reasoning | CodeCode Available | 1 |
| GuessWhat?! Visual object discovery through multi-modal dialogue | Nov 23, 2016 | ObjectObject Discovery | CodeCode Available | 1 |
| MindJourney: Test-Time Scaling with World Models for Spatial Reasoning | Jul 16, 2025 | Spatial Reasoning | —Unverified | 0 |
| EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Jul 14, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way | Jul 11, 2025 | Depth EstimationHallucination | —Unverified | 0 |
| M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning | Jul 11, 2025 | Spatial Reasoning | —Unverified | 0 |
| Scaling RL to Long Videos | Jul 10, 2025 | Reinforcement Learning (RL)Spatial Reasoning | CodeCode Available | 0 |
| OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding | Jul 10, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 0 |
| A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Jul 9, 2025 | 3D visual groundingAutonomous Navigation | —Unverified | 0 |
| Optimising Language Models for Downstream Tasks: A Post-Training Perspective | Jun 26, 2025 | parameter-efficient fine-tuningSpatial Reasoning | —Unverified | 0 |
| ImplicitQA: Going beyond frames towards Implicit Video Reasoning | Jun 26, 2025 | Spatial Reasoning | CodeCode Available | 0 |
| World-aware Planning Narratives Enhance Large Vision-Language Model Planner | Jun 26, 2025 | Imitation LearningLanguage Modeling | —Unverified | 0 |
| ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models | Jun 26, 2025 | Spatial ReasoningVideo Generation | —Unverified | 0 |
| From 2D to 3D Cognition: A Brief Survey of General World Models | Jun 25, 2025 | Autonomous DrivingScene Generation | —Unverified | 0 |
| Video Perception Models for 3D Scene Synthesis | Jun 25, 2025 | 3D ReconstructionImage Generation | —Unverified | 0 |
| ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies | Jun 17, 2025 | Scene GenerationSpatial Reasoning | —Unverified | 0 |
| SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks | Jun 17, 2025 | MathSpatial Reasoning | —Unverified | 0 |
| PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning | Jun 17, 2025 | General Reinforcement LearningMultimodal Reasoning | —Unverified | 0 |