| Narrowing the Gap between Vision and Action in Navigation | Aug 19, 2024 | DecoderSpatial Reasoning | CodeCode Available | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| SceneGPT: A Language Model for 3D Scene Understanding | Aug 13, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications | Aug 12, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Aug 5, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model | Aug 1, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Jul 19, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction | Jul 19, 2024 | 3D ReconstructionSpatial Reasoning | —Unverified | 0 |
| A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | Jul 17, 2024 | MathMinecraft | —Unverified | 0 |
| Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay | Jul 12, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| Learning Action and Reasoning-Centric Image Editing from Videos and Simulations | Jul 3, 2024 | AttributeSpatial Reasoning | CodeCode Available | 1 |
| GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning | Jul 2, 2024 | Spatial Reasoning | —Unverified | 0 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jun 21, 2024 | Spatial Reasoning | CodeCode Available | 2 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| Neuro-symbolic Training for Reasoning over Spatial Language | Jun 19, 2024 | Spatial ReasoningTransfer Learning | CodeCode Available | 0 |
| AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Jun 19, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 |
| WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | Jun 16, 2024 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics | Jun 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | Jun 13, 2024 | Mathobject-detection | CodeCode Available | 3 |
| Flow of Reasoning:Training LLMs for Divergent Problem Solving with Minimal Examples | Jun 9, 2024 | ARCDiversity | CodeCode Available | 2 |
| Quantifying Geospatial in the Common Crawl Corpus | Jun 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |