| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 |
| Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models | Mar 21, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Mar 7, 2025 | Conformal PredictionLanguage Modelling | —Unverified | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation | Dec 7, 2023 | Spatial ReasoningText-to-Video Generation | —Unverified | 0 |
| Do Multimodal Language Models Really Understand Direction? A Benchmark for Compass Direction Reasoning | Dec 21, 2024 | Spatial Reasoning | —Unverified | 0 |
| Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models | May 3, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 |