| Locality Alignment Improves Vision-Language Models | Oct 14, 2024 | Semantic SegmentationSpatial Reasoning | CodeCode Available | 2 | 5 |
| Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Jun 11, 2025 | Multimodal ReasoningSpatial Reasoning | CodeCode Available | 2 | 5 |
| DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Nov 20, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 2 | 5 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 | 5 |
| Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | Jan 8, 2024 | Relation MappingSpatial Reasoning | CodeCode Available | 1 | 5 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 | 5 |
| BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues | Oct 20, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation | Jan 16, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 1 | 5 |
| Grounded Chain-of-Thought for Multimodal Large Language Models | Mar 17, 2025 | HallucinationSpatial Reasoning | CodeCode Available | 1 | 5 |
| Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection | Jan 1, 2021 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 | 5 |
| GuessWhat?! Visual object discovery through multi-modal dialogue | Nov 23, 2016 | ObjectObject Discovery | CodeCode Available | 1 | 5 |
| Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT images | Feb 9, 2021 | DecoderMedical Image Segmentation | CodeCode Available | 1 | 5 |
| MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | May 26, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 | 5 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 | 5 |
| An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Nov 9, 2024 | object-detectionObject Detection | CodeCode Available | 1 | 5 |
| Long Range Arena: A Benchmark for Efficient Transformers | Nov 8, 2020 | 16kBenchmarking | CodeCode Available | 1 | 5 |
| Geospatial Mechanistic Interpretability of Large Language Models | May 6, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| A Universal Semantic-Geometric Representation for Robotic Manipulation | Jun 18, 2023 | 3D geometryRobot Manipulation | CodeCode Available | 1 | 5 |
| Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Mar 16, 2025 | Autonomous DrivingRAG | CodeCode Available | 1 | 5 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Feb 5, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation | Jun 11, 2025 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| Learning and Reasoning with the Graph Structure Representation in Robotic Surgery | Jul 7, 2020 | Edge ClassificationGraph Generation | CodeCode Available | 1 | 5 |
| End-to-End Egospheric Spatial Memory | Feb 15, 2021 | General Reinforcement LearningImitation Learning | CodeCode Available | 1 | 5 |