| ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment | Mar 4, 2025 | MinecraftSpatial Reasoning | —Unverified | 0 |
| Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas | Mar 3, 2025 | Spatial Reasoning | CodeCode Available | 2 |
| FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks | Feb 25, 2025 | Image GenerationLayout Generation | CodeCode Available | 0 |
| Introducing Visual Perception Token into Multimodal Large Language Model | Feb 24, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task | Feb 23, 2025 | Decision MakingNavigate | CodeCode Available | 0 |
| AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | Feb 20, 2025 | Autonomous NavigationNavigate | CodeCode Available | 2 |
| Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation | Feb 20, 2025 | Decision MakingEfficient Exploration | —Unverified | 0 |
| CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City Space | Feb 18, 2025 | Embodied Question AnsweringQuestion Answering | CodeCode Available | 1 |
| SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation | Feb 18, 2025 | Object RearrangementRobot Manipulation | CodeCode Available | 3 |