| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 |
| Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models | Mar 21, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Mar 7, 2025 | Conformal PredictionLanguage Modelling | —Unverified | 0 |
| Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning | Dec 21, 2024 | Spatial Reasoning | —Unverified | 0 |
| Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | May 3, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 |
| Distortions in Judged Spatial Relations in Large Language Models | Jan 8, 2024 | MisconceptionsSpatial Reasoning | —Unverified | 0 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | Nov 20, 2024 | BenchmarkingNetHack | —Unverified | 0 |
| Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning | Jun 5, 2025 | In-Context LearningIndoor Scene Synthesis | —Unverified | 0 |
| A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing | Jun 10, 2025 | Spatial Reasoning | —Unverified | 0 |
| Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs | Apr 22, 2023 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| DetailMaster: Can Your Text-to-Image Model Handle Long Prompts? | May 22, 2025 | AttributeSpatial Reasoning | —Unverified | 0 |
| A Vision Centric Remote Sensing Benchmark | Mar 20, 2025 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| A dual contrastive framework | Dec 13, 2024 | Contrastive LearningDecoder | —Unverified | 0 |
| Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model | Aug 1, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features | Jan 7, 2025 | 3D Object DetectionComputational Efficiency | —Unverified | 0 |
| AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning | Mar 24, 2025 | Spatial Reasoning | —Unverified | 0 |
| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications | Aug 12, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision | May 16, 2025 | Large Language ModelNavigate | —Unverified | 0 |