| CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs | Dec 27, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |
| Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks? | May 23, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |
| Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind | May 18, 2025 | BenchmarkingScene Understanding | —Unverified | 0 | 0 |
| Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Aug 23, 2024 | HallucinationPrompt Engineering | —Unverified | 0 | 0 |
| Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps | May 24, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 | 0 |
| CASPER: Cognitive Architecture for Social Perception and Engagement in Robots | Sep 1, 2022 | Action RecognitionNavigate | —Unverified | 0 | 0 |
| Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding | Jan 1, 2025 | 3DGSLarge Language Model | —Unverified | 0 | 0 |
| Challenge of Spatial Cognition for Deep Learning | Jul 30, 2019 | Deep LearningSpatial Reasoning | —Unverified | 0 | 0 |
| Challenges Faced by Large Language Models in Solving Multi-Agent Flocking | Apr 6, 2024 | Decision MakingSpatial Reasoning | —Unverified | 0 | 0 |
| CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | Mar 12, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 | 0 |
| Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments | Sep 4, 2024 | Continual LearningNavigate | —Unverified | 0 | 0 |
| Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise | Nov 27, 2018 | AI AgentHeuristic Search | —Unverified | 0 | 0 |
| Commonsense Spatial Reasoning for Visually Intelligent Agents | Apr 1, 2021 | Spatial Reasoning | —Unverified | 0 | 0 |
| Commonsense Visual Sensemaking for Autonomous Driving: On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics | Dec 28, 2020 | Autonomous DrivingQuestion Answering | —Unverified | 0 | 0 |
| Complexity Classification in Infinite-Domain Constraint Satisfaction | Jan 4, 2012 | ClassificationGeneral Classification | —Unverified | 0 | 0 |
| Contextual Reasoning for Scene Generation (Technical Report) | May 3, 2023 | Scene GenerationSpatial Reasoning | —Unverified | 0 | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 | 0 |
| Controllable Text-to-Image Generation with GPT-4 | May 29, 2023 | Image GenerationInstruction Following | —Unverified | 0 | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 | 0 |
| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 | 0 |
| Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs | Apr 22, 2023 | Language Model EvaluationLanguage Modeling | —Unverified | 0 | 0 |
| Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning | Jun 5, 2025 | In-Context LearningIndoor Scene Synthesis | —Unverified | 0 | 0 |
| Distortions in Judged Spatial Relations in Large Language Models | Jan 8, 2024 | MisconceptionsSpatial Reasoning | —Unverified | 0 | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 | 0 |
| Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning | Dec 21, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 | 0 |
| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 | 0 |
| EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery | Apr 17, 2025 | Large Language ModelMulti-Task Learning | —Unverified | 0 | 0 |
| Ego-Centric Spatial Memory Networks | Jan 1, 2021 | CPUGPU | —Unverified | 0 | 0 |
| Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark | Jan 1, 2023 | 3D Pose EstimationHuman Detection | —Unverified | 0 | 0 |
| Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation | Apr 13, 2025 | NavigateObject Rearrangement | —Unverified | 0 | 0 |
| Embodied Scene Understanding for Vision Language Models via MetaVQA | Jan 15, 2025 | Decision MakingQuestion Answering | —Unverified | 0 | 0 |
| EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks | Mar 14, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| Embodied World Models Emerge from Navigational Task in Open-Ended Environments | Apr 15, 2025 | Meta Reinforcement LearningSpatial Reasoning | —Unverified | 0 | 0 |
| EmbRACE-3K: Embodied Reasoning and Action in Complex Environments | Jul 14, 2025 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 | 0 |
| Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation | Apr 9, 2025 | HallucinationSpatial Reasoning | —Unverified | 0 | 0 |
| Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning | Oct 15, 2023 | BenchmarkingSpatial Reasoning | —Unverified | 0 | 0 |
| Explicit Object Relation Alignment for Vision and Language Navigation | Nov 16, 2021 | Instruction FollowingRelation | —Unverified | 0 | 0 |
| Exploring and Improving the Spatial Reasoning Abilities of Large Language Models | Dec 2, 2023 | Spatial Reasoning | —Unverified | 0 | 0 |
| Exploring Spatial Language Grounding Through Referring Expressions | Feb 4, 2025 | Image CaptioningNegation | —Unverified | 0 | 0 |
| Exploring The Spatial Reasoning Ability of Neural Models in Human IQ Tests | Apr 11, 2020 | Question AnsweringSpatial Reasoning | —Unverified | 0 | 0 |
| Fine-grained Qualitative Spatial Reasoning about Point Positions | Nov 15, 2019 | Spatial Reasoning | —Unverified | 0 | 0 |
| First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images | Apr 30, 2025 | Spatial Reasoning | —Unverified | 0 | 0 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 | 0 |
| FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models | Nov 16, 2023 | Instruction FollowingLogical Reasoning | —Unverified | 0 | 0 |
| Following Instructions by Imagining and Reaching Visual Goals | Jan 25, 2020 | Instruction FollowingReinforcement Learning | —Unverified | 0 | 0 |
| Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Apr 14, 2025 | BenchmarkingEarth Observation | —Unverified | 0 | 0 |
| FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors | May 2, 2025 | ObjectSpatial Reasoning | —Unverified | 0 | 0 |
| From 2D to 3D Cognition: A Brief Survey of General World Models | Jun 25, 2025 | Autonomous DrivingScene Generation | —Unverified | 0 | 0 |
| From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes | Jun 5, 2025 | 3D visual groundingObject | —Unverified | 0 | 0 |