| InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | Apr 19, 2025 | Action GenerationLogical Reasoning | CodeCode Available | 2 |
| SpaceR: Reinforcing MLLMs in Video Spatial Reasoning | Apr 2, 2025 | MMESpatial Reasoning | CodeCode Available | 2 |
| ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | May 29, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning | Apr 12, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | Jan 8, 2024 | Relation MappingSpatial Reasoning | CodeCode Available | 1 |
| SPARTQA: A Textual Question Answering Benchmark for Spatial Reasoning | Jun 1, 2021 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues | Oct 20, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting | Apr 25, 2020 | Geographic Question AnsweringGraph Embedding | CodeCode Available | 1 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings | Mar 31, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Nov 9, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension | Apr 12, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Revisiting spatio-temporal layouts for compositional action recognition | Nov 2, 2021 | Action ClassificationAction Detection | CodeCode Available | 1 |
| SBEVNet: End-to-End Deep Stereo Layout Estimation | May 25, 2021 | Depth EstimationDisparity Estimation | CodeCode Available | 1 |
| A Universal Semantic-Geometric Representation for Robotic Manipulation | Jun 18, 2023 | 3D geometryRobot Manipulation | CodeCode Available | 1 |
| Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models | Feb 12, 2025 | AttributeDiagnostic | CodeCode Available | 1 |
| Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation | Mar 23, 2020 | DecoderSpatial Reasoning | CodeCode Available | 1 |
| DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding | May 10, 2024 | RelationSpatial Reasoning | CodeCode Available | 1 |
| Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities | Oct 22, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 |
| Spatially Aware Multimodal Transformers for TextVQA | Jul 23, 2020 | Optical Character Recognition (OCR)Spatial Reasoning | CodeCode Available | 1 |
| Self-supervised Spatial Reasoning on Multi-View Line Drawings | Apr 27, 2021 | Binary ClassificationContrastive Learning | CodeCode Available | 1 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images | Feb 9, 2021 | DecoderMedical Image Segmentation | CodeCode Available | 1 |
| Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Mar 16, 2025 | Autonomous DrivingRAG | CodeCode Available | 1 |
| AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Jun 19, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| Long Range Arena: A Benchmark for Efficient Transformers | Nov 8, 2020 | 16kBenchmarking | CodeCode Available | 1 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 |
| Joint Spatio-Textual Reasoning for Answering Tourism Questions | Sep 28, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| Learning Action and Reasoning-Centric Image Editing from Videos and Simulations | Jul 3, 2024 | AttributeSpatial Reasoning | CodeCode Available | 1 |
| CLIPort: What and Where Pathways for Robotic Manipulation | Sep 24, 2021 | Imitation LearningRobotic Grasping | CodeCode Available | 1 |
| iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs | Feb 5, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Learning and Reasoning with the Graph Structure Representation in Robotic Surgery | Jul 7, 2020 | Edge ClassificationGraph Generation | CodeCode Available | 1 |
| Knot So Simple: A Minimalistic Environment for Spatial Reasoning | May 23, 2025 | Model Predictive ControlSpatial Reasoning | CodeCode Available | 1 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 |
| 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation | Jun 11, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | Feb 26, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 |
| MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | May 26, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 |
| Decoding Language Spatial Relations to 2D Spatial Arrangements | Nov 1, 2020 | Spatial Reasoning | CodeCode Available | 1 |
| HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation | Jan 16, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 1 |