| AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning | Mar 24, 2025 | Spatial Reasoning | —Unverified | 0 |
| Aether: Geometric-Aware Unified World Modeling | Mar 24, 2025 | Dynamic ReconstructionPrediction | —Unverified | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models | Mar 21, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D Scenes | Mar 20, 2025 | Scene UnderstandingSpatial Reasoning | CodeCode Available | 2 |
| Sonata: Self-Supervised Learning of Reliable Point Representations | Mar 20, 2025 | 3D Semantic SegmentationSelf-Supervised Learning | CodeCode Available | 4 |
| A Vision Centric Remote Sensing Benchmark | Mar 20, 2025 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Mar 20, 2025 | Instruction FollowingNatural Language Understanding | —Unverified | 0 |
| Statistical applications of the 20/60/20 rule in risk management and portfolio optimization | Mar 19, 2025 | ManagementPortfolio Optimization | —Unverified | 0 |
| UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Mar 19, 2025 | NavigateSpatial Reasoning | —Unverified | 0 |
| CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models | Mar 18, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 |
| Free-form language-based robotic reasoning and grasping | Mar 17, 2025 | FormRobotic Grasping | CodeCode Available | 2 |
| Grounded Chain-of-Thought for Multimodal Large Language Models | Mar 17, 2025 | HallucinationSpatial Reasoning | CodeCode Available | 1 |
| VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and Invisibility | Mar 16, 2025 | Spatial Reasoning | CodeCode Available | 1 |
| Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene Understanding | Mar 16, 2025 | Autonomous DrivingRAG | CodeCode Available | 1 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks | Mar 14, 2025 | Spatial Reasoning | —Unverified | 0 |
| CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | Mar 12, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios | Mar 10, 2025 | Image RestorationImage Super-Resolution | —Unverified | 0 |
| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 |
| PointVLA: Injecting the 3D World into Vision-Language-Action Models | Mar 10, 2025 | Imitation LearningSpatial Reasoning | CodeCode Available | 4 |
| Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity | Mar 8, 2025 | Depth EstimationScene Understanding | CodeCode Available | 0 |
| An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Mar 7, 2025 | Conformal PredictionLanguage Modelling | —Unverified | 0 |
| Factorio Learning Environment | Mar 6, 2025 | Program SynthesisSpatial Reasoning | CodeCode Available | 4 |