| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 | 5 |
| Video-R1: Reinforcing Video Reasoning in MLLMs | Mar 27, 2025 | MVBenchReinforcement Learning (RL) | CodeCode Available | 4 | 5 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 | 5 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 | 5 |
| VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction | May 26, 2025 | 3D ReconstructionSpatial Reasoning | CodeCode Available | 3 | 5 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 | 5 |
| MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the Metaverse | Mar 24, 2025 | Layout GenerationReinforcement Learning (RL) | CodeCode Available | 3 | 5 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 | 5 |
| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 | 5 |
| From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D | Mar 29, 2025 | Spatial Reasoning | CodeCode Available | 2 | 5 |