| Learning Action and Reasoning-Centric Image Editing from Videos and Simulations | Jul 3, 2024 | AttributeSpatial Reasoning | CodeCode Available | 1 |
| GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning | Jul 2, 2024 | Spatial Reasoning | —Unverified | 0 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jun 21, 2024 | Spatial Reasoning | CodeCode Available | 2 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| Neuro-symbolic Training for Reasoning over Spatial Language | Jun 19, 2024 | Spatial ReasoningTransfer Learning | CodeCode Available | 0 |
| AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Jun 19, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Jun 19, 2024 | Spatial Reasoning | CodeCode Available | 3 |