| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |
| LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models | May 23, 2023 | Common Sense ReasoningImage Generation | CodeCode Available | 2 |
| ConceptFusion: Open-set Multimodal 3D Mapping | Feb 14, 2023 | 3D geometryAutonomous Driving | CodeCode Available | 2 |
| Warehouse Spatial Question Answering with LLM Agent | Jul 14, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation | Jun 11, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations | Jun 5, 2025 | 4kSpatial Reasoning | CodeCode Available | 1 |
| VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD Software | May 30, 2025 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoT | May 30, 2025 | Spatial ReasoningVisual Reasoning | CodeCode Available | 1 |
| Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition | May 29, 2025 | Handwritten Mathmatical Expression RecognitionLanguage Modeling | CodeCode Available | 1 |
| ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained Knowledge | May 28, 2025 | Imitation LearningMath | CodeCode Available | 1 |
| MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | May 26, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 |
| Knot So Simple: A Minimalistic Environment for Spatial Reasoning | May 23, 2025 | Model Predictive ControlSpatial Reasoning | CodeCode Available | 1 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 |
| Visuospatial Cognitive Assistant | May 18, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Towards Visuospatial Cognition via Hierarchical Fusion of Visual Experts | May 18, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation | May 13, 2025 | Robot ManipulationSpatial Reasoning | CodeCode Available | 1 |
| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 |
| Geospatial Mechanistic Interpretability of Large Language Models | May 6, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization | Apr 25, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction Understanding | Apr 17, 2025 | Image GenerationLarge Language Model | CodeCode Available | 1 |
| Improved Visual-Spatial Reasoning via R1-Zero-Like Training | Apr 1, 2025 | GPUSpatial Reasoning | CodeCode Available | 1 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 |
| Grounded Chain-of-Thought for Multimodal Large Language Models | Mar 17, 2025 | HallucinationSpatial Reasoning | CodeCode Available | 1 |
| Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Mar 16, 2025 | Autonomous DrivingRAG | CodeCode Available | 1 |