| Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark | Oct 6, 2024 | Mathematical ReasoningSpatial Reasoning | CodeCode Available | 0 |
| Evaluation of Code LLMs on Geospatial Code Generation | Oct 6, 2024 | Code GenerationSpatial Reasoning | CodeCode Available | 0 |
| SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models | Oct 4, 2024 | Scene UnderstandingSpatial Reasoning | —Unverified | 0 |
| Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds | Sep 30, 2024 | Spatial Reasoning | —Unverified | 0 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 |
| VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs | Sep 30, 2024 | EgoSchemaLanguage Modelling | CodeCode Available | 1 |
| On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | Sep 30, 2024 | Decision MakingManagement | CodeCode Available | 1 |
| Spatial Reasoning and Planning for Deep Embodied Agents | Sep 28, 2024 | Autonomous DrivingMinecraft | —Unverified | 0 |
| DARE: Diverse Visual Question Answering with Robustness Evaluation | Sep 26, 2024 | image-classificationImage Classification | —Unverified | 0 |
| Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning? | Sep 25, 2024 | In-Context LearningNovel Concepts | CodeCode Available | 0 |