| Locality Alignment Improves Vision-Language Models | Oct 14, 2024 | Semantic SegmentationSpatial Reasoning | CodeCode Available | 2 |
| Testing GPT-4-o1-preview on math and science problems: A follow-up study | Oct 11, 2024 | MathSpatial Reasoning | —Unverified | 0 |
| Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Oct 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Structured Spatial Reasoning with Open Vocabulary Object Detectors | Oct 9, 2024 | ObjectObject Rearrangement | —Unverified | 0 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| Evaluation of Code LLMs on Geospatial Code Generation | Oct 6, 2024 | Code GenerationSpatial Reasoning | CodeCode Available | 0 |
| SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models | Oct 4, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds | Sep 30, 2024 | Spatial Reasoning | —Unverified | 0 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 |
| VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Sep 30, 2024 | EgoSchemaLanguage Modelling | CodeCode Available | 1 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 |
| Spatial Reasoning and Planning for Deep Embodied Agents | Sep 28, 2024 | Autonomous DrivingMinecraft | —Unverified | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Sep 25, 2024 | In-Context LearningNovel Concepts | CodeCode Available | 0 |
| Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models | Sep 23, 2024 | Common Sense ReasoningSpatial Reasoning | —Unverified | 0 |
| Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data | Sep 19, 2024 | Logical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models | Sep 15, 2024 | Spatial Reasoning | —Unverified | 0 |
| ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching | Sep 6, 2024 | Action GenerationSpatial Reasoning | —Unverified | 0 |
| Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments | Sep 4, 2024 | Continual LearningNavigate | —Unverified | 0 |
| AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models | Aug 28, 2024 | Spatial ReasoningTask Planning | —Unverified | 0 |
| Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation | Aug 28, 2024 | ObjectSemantic Segmentation | CodeCode Available | 2 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications | Aug 27, 2024 | Spatial Reasoning | —Unverified | 0 |
| Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Aug 23, 2024 | HallucinationPrompt Engineering | —Unverified | 0 |