| Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds | Sep 30, 2024 | Spatial Reasoning | —Unverified | 0 |
| Spatial Reasoning and Planning for Deep Embodied Agents | Sep 28, 2024 | Autonomous DrivingMinecraft | —Unverified | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Sep 25, 2024 | In-Context LearningNovel Concepts | CodeCode Available | 0 |
| Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models | Sep 23, 2024 | Common Sense ReasoningSpatial Reasoning | —Unverified | 0 |
| Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic Data | Sep 19, 2024 | Logical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models | Sep 15, 2024 | Spatial Reasoning | —Unverified | 0 |
| ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching | Sep 6, 2024 | Action GenerationSpatial Reasoning | —Unverified | 0 |
| Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments | Sep 4, 2024 | Continual LearningNavigate | —Unverified | 0 |
| Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games | Aug 28, 2024 | Atari GamesBenchmarking | —Unverified | 0 |
| AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models | Aug 28, 2024 | Spatial ReasoningTask Planning | —Unverified | 0 |
| Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications | Aug 27, 2024 | Spatial Reasoning | —Unverified | 0 |
| Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | Aug 23, 2024 | HallucinationPrompt Engineering | —Unverified | 0 |
| Narrowing the Gap between Vision and Action in Navigation | Aug 19, 2024 | DecoderSpatial Reasoning | CodeCode Available | 0 |
| Beyond the Hype: A dispassionate look at vision-language models in medical scenario | Aug 16, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| SceneGPT: A Language Model for 3D Scene Understanding | Aug 13, 2024 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications | Aug 12, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models | Aug 5, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model | Aug 1, 2024 | EgoSchemaLanguage Modeling | —Unverified | 0 |
| I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction | Jul 19, 2024 | 3D ReconstructionSpatial Reasoning | —Unverified | 0 |
| OpenSU3D: Open World 3D Scene Understanding using Foundation Models | Jul 19, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| A LLM Benchmark based on the Minecraft Builder Dialog Agent Task | Jul 17, 2024 | MathMinecraft | —Unverified | 0 |
| Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlay | Jul 12, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning | Jul 2, 2024 | Spatial Reasoning | —Unverified | 0 |
| FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts | Jun 27, 2024 | Decision MakingLogical Reasoning | —Unverified | 0 |
| Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | Jun 20, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs | Jun 19, 2024 | Spatial ReasoningVisual Reasoning | —Unverified | 0 |
| Neuro-symbolic Training for Reasoning over Spatial Language | Jun 19, 2024 | Spatial ReasoningTransfer Learning | CodeCode Available | 0 |
| WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | Jun 16, 2024 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics | Jun 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models | Jun 7, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| Quantifying Geospatial in the Common Crawl Corpus | Jun 7, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models | Jun 3, 2024 | Language ModellingSpatial Reasoning | —Unverified | 0 |
| Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning | May 23, 2024 | Logical Reasoning Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks? | May 23, 2024 | Spatial Reasoning | —Unverified | 0 |
| Generating Human Motion in 3D Scenes from Text Descriptions | May 13, 2024 | Motion GenerationObject | —Unverified | 0 |
| RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation | May 9, 2024 | Natural Language QueriesRobot Navigation | —Unverified | 0 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Re-Thinking Inverse Graphics With Large Language Models | Apr 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Apr 11, 2024 | DescriptiveHallucination | CodeCode Available | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Challenges Faced by Large Language Models in Solving Multi-Agent Flocking | Apr 6, 2024 | Decision MakingSpatial Reasoning | —Unverified | 0 |
| Grounding Spatial Relations in Text-Only Language Models | Mar 20, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors | Mar 18, 2024 | HallucinationMotion Planning | —Unverified | 0 |
| JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection | Mar 12, 2024 | Motion CompensationMoving Object Detection | —Unverified | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| A Surprising Failure? Multimodal LLMs and the NLVR Challenge | Feb 26, 2024 | SentenceSpatial Reasoning | —Unverified | 0 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |