| Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | Dec 18, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 4 |
| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 |
| VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction | May 26, 2025 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 3 |
| MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse | Mar 24, 2025 | Layout GenerationReinforcement Learning (RL) | CodeCode Available | 3 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 |
| Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing | Jun 11, 2025 | Multimodal ReasoningSpatial Reasoning | CodeCode Available | 2 |
| ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing Tasks | May 29, 2025 | Spatial Reasoning | CodeCode Available | 2 |