| 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark | Dec 10, 2024 | Autonomous NavigationSpatial Reasoning | —Unverified | 0 |
| SAT: Dynamic Spatial Aptitude Training for Multimodal Language Models | Dec 10, 2024 | Action RecognitionSpatial Reasoning | CodeCode Available | 4 |
| TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-Action | Dec 7, 2024 | Depth EstimationMathematical Reasoning | CodeCode Available | 2 |
| VideoSAVi: Self-Aligned Video Language Models without Human Supervision | Dec 1, 2024 | EgoSchemaMVBench | —Unverified | 0 |
| Can Large Language Models Reason about the Region Connection Calculus? | Nov 29, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agents | Nov 27, 2024 | Autonomous NavigationObject Recognition | CodeCode Available | 0 |
| Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs | Nov 27, 2024 | Logical ReasoningSemantic Parsing | —Unverified | 0 |
| CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos | Nov 26, 2024 | Common Sense ReasoningImitation Learning | CodeCode Available | 3 |
| APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World Agents | Nov 26, 2024 | Few-Shot LearningLarge Language Model | CodeCode Available | 0 |
| Probing the limitations of multimodal language models for chemistry and materials research | Nov 25, 2024 | Experimental DesignSpatial Reasoning | CodeCode Available | 2 |