| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Contextual Reasoning for Scene Generation (Technical Report) | May 3, 2023 | Scene GenerationSpatial Reasoning | —Unverified | 0 |
| A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Large Language Models and Mathematical Reasoning Failures | Feb 17, 2025 | Mathematical ReasoningPhysical Intuition | —Unverified | 0 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| A Survey for Foundation Models in Autonomous Driving | Feb 2, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation | May 19, 2025 | Multimodal ReasoningRobot Manipulation | —Unverified | 0 |
| A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | Jul 17, 2024 | MathMinecraft | —Unverified | 0 |
| LanguageRefer: Spatial-Language Model for 3D Visual Grounding | Jul 7, 2021 | 3D visual groundingLanguage Modeling | —Unverified | 0 |
| Grounded Reinforcement Learning for Visual Reasoning | May 29, 2025 | reinforcement-learningReinforcement Learning | —Unverified | 0 |