| CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory | May 8, 2025 | Large Language ModelNavigate | CodeCode Available | 1 | 5 |
| CityGPT: Empowering Urban Spatial Cognition of Large Language Models | Jun 20, 2024 | Code GenerationMath | CodeCode Available | 1 | 5 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 | 5 |
| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 | 5 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 | 5 |
| Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models | Mar 25, 2025 | BenchmarkingImage Captioning | CodeCode Available | 1 | 5 |
| End-to-End Egospheric Spatial Memory | Feb 15, 2021 | General Reinforcement LearningImitation Learning | CodeCode Available | 1 | 5 |
| MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents | May 26, 2025 | BenchmarkingMinecraft | CodeCode Available | 1 | 5 |
| CoNav: Collaborative Cross-Modal Reasoning for Embodied Navigation | May 22, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 1 | 5 |
| DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions | Sep 7, 2023 | PositionSpatial Reasoning | CodeCode Available | 1 | 5 |