| A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning | Feb 15, 2023 | Decision MakingSpatial Reasoning | —Unverified | 0 | 0 |
| Embodied Scene Understanding for Vision Language Models via MetaVQA | Jan 15, 2025 | Decision MakingQuestion Answering | —Unverified | 0 | 0 |
| Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation | Apr 13, 2025 | NavigateObject Rearrangement | —Unverified | 0 | 0 |
| Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios | Mar 10, 2025 | Image RestorationImage Super-Resolution | —Unverified | 0 | 0 |
| An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-8 | Sep 27, 2023 | Spatial Reasoning | —Unverified | 0 | 0 |
| Ego-Humans: An Ego-Centric 3D Multi-Human Benchmark | Jan 1, 2023 | 3D Pose EstimationHuman Detection | —Unverified | 0 | 0 |
| Ego-Centric Spatial Memory Networks | Jan 1, 2021 | CPUGPU | —Unverified | 0 | 0 |
| EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery | Apr 17, 2025 | Large Language ModelMulti-Task Learning | —Unverified | 0 | 0 |
| A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding | Jul 9, 2025 | 3D visual groundingAutonomous Navigation | —Unverified | 0 | 0 |
| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 | 0 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 | 0 |
| Learning to encode spatial relations from natural language | May 1, 2019 | Spatial Reasoning | —Unverified | 0 | 0 |
| Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models | Mar 21, 2025 | DiagnosticObject Recognition | —Unverified | 0 | 0 |
| An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Mar 7, 2025 | Conformal PredictionLanguage Modelling | —Unverified | 0 | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 | 0 |
| 3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow | Jan 28, 2025 | Instruction FollowingMixture-of-Experts | —Unverified | 0 | 0 |
| Learning event representation: As sparse as possible, but not sparser | Oct 2, 2017 | ClassificationGeneral Classification | —Unverified | 0 | 0 |
| Large Language Models and Mathematical Reasoning Failures | Feb 17, 2025 | Mathematical ReasoningPhysical Intuition | —Unverified | 0 | 0 |
| Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning | Dec 21, 2024 | Spatial Reasoning | —Unverified | 0 | 0 |
| Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | May 3, 2025 | DiagnosticObject Recognition | —Unverified | 0 | 0 |
| Large Language-Geometry Model: When LLM meets Equivariance | Feb 16, 2025 | modelSpatial Reasoning | —Unverified | 0 | 0 |
| LanguageRefer: Spatial-Language Model for 3D Visual Grounding | Jul 7, 2021 | 3D visual groundingLanguage Modeling | —Unverified | 0 | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 | 0 |
| LABNet: Local Graph Aggregation Network with Class Balanced Loss for Vehicle Re-Identification | Nov 29, 2020 | Spatial ReasoningVehicle Re-Identification | —Unverified | 0 | 0 |