| IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes | Mar 20, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 2 | 5 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 | 5 |
| Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | Jun 21, 2024 | Spatial Reasoning | CodeCode Available | 2 | 5 |
| Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes | Aug 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | Mar 31, 2025 | MathSpatial Reasoning | CodeCode Available | 2 | 5 |
| Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning | Dec 16, 2024 | HallucinationRobot Manipulation | CodeCode Available | 2 | 5 |
| Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks | Mar 27, 2025 | Imitation LearningMathematical Reasoning | CodeCode Available | 2 | 5 |
| End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | Nov 8, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners | Apr 19, 2025 | Action GenerationLogical Reasoning | CodeCode Available | 2 | 5 |
| GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning | May 22, 2025 | AttributeImage Generation | CodeCode Available | 2 | 5 |