| 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark | Dec 10, 2024 | Autonomous NavigationSpatial Reasoning | —Unverified | 0 |
| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 |
| TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | Dec 7, 2024 | Depth EstimationMathematical Reasoning | CodeCode Available | 2 |
| VideoSAVi: Self-Aligned Video Language Models without Human Supervision | Dec 1, 2024 | EgoSchemaMVBench | —Unverified | 0 |
| Can Large Language Models Reason about the Region Connection Calculus? | Nov 29, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | Nov 27, 2024 | Autonomous NavigationObject Recognition | CodeCode Available | 0 |
| Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs | Nov 27, 2024 | Logical ReasoningSemantic Parsing | —Unverified | 0 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 |
| APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents | Nov 26, 2024 | Few-Shot LearningLarge Language Model | CodeCode Available | 0 |
| Probing the limitations of multimodal language models for chemistry and materials research | Nov 25, 2024 | Experimental DesignSpatial Reasoning | CodeCode Available | 2 |
| TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation | Nov 25, 2024 | Spatial Reasoning | —Unverified | 0 |
| RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics | Nov 25, 2024 | Robot ManipulationScene Understanding | —Unverified | 0 |
| DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous Driving | Nov 20, 2024 | Autonomous Drivingmotion prediction | CodeCode Available | 2 |
| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning | Nov 15, 2024 | DescriptiveObject | —Unverified | 0 |
| Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting | Nov 14, 2024 | Depth EstimationImage Inpainting | —Unverified | 0 |
| AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality | Nov 9, 2024 | Spatial Reasoning | —Unverified | 0 |
| An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Nov 9, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GPT-4o System Card | Oct 25, 2024 | Multiple-choiceSpatial Reasoning | —Unverified | 0 |
| Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning | Oct 24, 2024 | Graph EmbeddingKnowledge Graph Embedding | —Unverified | 0 |
| Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Oct 24, 2024 | Novel View SynthesisPose Estimation | —Unverified | 0 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities | Oct 22, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning | Oct 21, 2024 | Spatial ReasoningSynthetic Data Generation | —Unverified | 0 |