| VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility | Mar 16, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 |
| Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal Models | Feb 12, 2025 | AttributeDiagnostic | CodeCode Available | 1 |
| iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs | Feb 5, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Feb 5, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 1 |
| HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation | Jan 16, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 1 |
| An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Models | Nov 9, 2024 | object-detectionObject Detection | CodeCode Available | 1 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities | Oct 22, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Sep 30, 2024 | EgoSchemaLanguage Modelling | CodeCode Available | 1 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 |
| Learning Action and Reasoning-Centric Image Editing from Videos and Simulations | Jul 3, 2024 | AttributeSpatial Reasoning | CodeCode Available | 1 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 |
| AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video Understanding | Jun 19, 2024 | Question AnsweringSpatial Reasoning | CodeCode Available | 1 |
| TopViewRS: Vision-Language Models as Top-View Spatial Reasoners | Jun 4, 2024 | Multiple-choiceSpatial Reasoning | CodeCode Available | 1 |
| DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding | May 10, 2024 | RelationSpatial Reasoning | CodeCode Available | 1 |
| Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Apr 4, 2024 | Spatial ReasoningVisual Navigation | CodeCode Available | 1 |
| LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | Feb 26, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | Jan 8, 2024 | Relation MappingSpatial Reasoning | CodeCode Available | 1 |
| What's "up" with vision-language models? Investigating their struggle with spatial reasoning | Oct 30, 2023 | Spatial Reasoning | CodeCode Available | 1 |
| Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | Oct 19, 2023 | MuJoCoPrompt Engineering | CodeCode Available | 1 |
| Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning | Oct 5, 2023 | NavigateSpatial Reasoning | CodeCode Available | 1 |