| Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction | Oct 24, 2024 | Novel View SynthesisPose Estimation | —Unverified | 0 |
| Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning | Oct 24, 2024 | Graph EmbeddingKnowledge Graph Embedding | —Unverified | 0 |
| ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting | Oct 23, 2024 | Decision MakingMinecraft | CodeCode Available | 1 |
| Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities | Oct 22, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning | Oct 21, 2024 | Spatial ReasoningSynthetic Data Generation | —Unverified | 0 |
| Locality Alignment Improves Vision-Language Models | Oct 14, 2024 | Semantic SegmentationSpatial Reasoning | CodeCode Available | 2 |
| Testing GPT-4-o1-preview on math and science problems: A follow-up study | Oct 11, 2024 | MathSpatial Reasoning | —Unverified | 0 |
| Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning | Oct 11, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Structured Spatial Reasoning with Open Vocabulary Object Detectors | Oct 9, 2024 | ObjectObject Rearrangement | —Unverified | 0 |
| ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | Oct 9, 2024 | Spatial Reasoning | CodeCode Available | 1 |