| Spatial Reasoner: A 3D Inference Pipeline for XR Applications | Apr 25, 2025 | Spatial Reasoning | —Unverified | 0 |
| A Review of 3D Object Detection with Vision-Language Models | Apr 25, 2025 | 3D Object DetectionObject | —Unverified | 0 |
| A Call for New Recipes to Enhance Spatial Reasoning in MLLMs | Apr 21, 2025 | Spatial Reasoning | —Unverified | 0 |
| EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery | Apr 17, 2025 | Large Language ModelMulti-Task Learning | —Unverified | 0 |
| Intelligence of Things: A Spatial Context-Aware Control System for Smart Devices | Apr 16, 2025 | Spatial Reasoning | —Unverified | 0 |
| LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation | Apr 15, 2025 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Embodied World Models Emerge from Navigational Task in Open-Ended Environments | Apr 15, 2025 | Meta Reinforcement LearningSpatial Reasoning | —Unverified | 0 |
| Perturbed State Space Feature Encoders for Optical Flow with Event Cameras | Apr 14, 2025 | Event-based Optical FlowOptical Flow Estimation | —Unverified | 0 |
| A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science | Apr 14, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Foundation Models for Remote Sensing: An Analysis of MLLMs for Object Localization | Apr 14, 2025 | BenchmarkingEarth Observation | —Unverified | 0 |
| VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | Apr 14, 2025 | Logical ReasoningMultimodal Reasoning | —Unverified | 0 |
| Embodied Chain of Action Reasoning with Multi-Modal Foundation Model for Humanoid Loco-manipulation | Apr 13, 2025 | NavigateObject Rearrangement | —Unverified | 0 |
| 3D CoCa: Contrastive Learners are 3D Captioners | Apr 13, 2025 | 3D dense captioningCaption Generation | CodeCode Available | 0 |
| VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | Apr 12, 2025 | Spatial Reasoning | —Unverified | 0 |
| AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations | Apr 10, 2025 | Spatial ReasoningVisual Grounding | —Unverified | 0 |
| Endowing Embodied Agents with Spatial Reasoning Capabilities for Vision-and-Language Navigation | Apr 9, 2025 | HallucinationSpatial Reasoning | —Unverified | 0 |
| How to Enable LLM with 3D Capacity? A Survey of Spatial Reasoning in LLM | Apr 8, 2025 | Autonomous VehiclesSpatial Reasoning | —Unverified | 0 |
| Towards Visual Text Grounding of Multimodal Large Language Model | Apr 7, 2025 | BenchmarkingLanguage Modeling | —Unverified | 0 |
| Advancing Egocentric Video Question Answering with Multimodal Large Language Models | Apr 6, 2025 | Object RecognitionQuestion Answering | —Unverified | 0 |
| NuScenes-SpatialQA: A Spatial Understanding and Reasoning Benchmark for Vision-Language Models in Autonomous Driving | Apr 4, 2025 | 3d scene graph generationAutonomous Driving | —Unverified | 0 |
| Enabling Systematic Generalization in Abstract Spatial Reasoning through Meta-Learning for Compositionality | Apr 2, 2025 | Meta-LearningSpatial Reasoning | CodeCode Available | 0 |
| RSRWKV: A Linear-Complexity 2D Attention Mechanism for Efficient Remote Sensing Vision Task | Mar 26, 2025 | Spatial Reasoning | —Unverified | 0 |
| LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | Mar 25, 2025 | Autonomous NavigationQuestion Answering | —Unverified | 0 |
| ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models | Mar 25, 2025 | 4D reconstructionAutonomous Driving | —Unverified | 0 |
| DataPlatter: Boosting Robotic Manipulation Generalization with Minimal Costly Data | Mar 25, 2025 | Robot ManipulationSpatial Reasoning | —Unverified | 0 |
| Aether: Geometric-Aware Unified World Modeling | Mar 24, 2025 | Dynamic ReconstructionPrediction | —Unverified | 0 |
| AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning | Mar 24, 2025 | Spatial Reasoning | —Unverified | 0 |
| MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation | Mar 23, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models | Mar 21, 2025 | DiagnosticObject Recognition | —Unverified | 0 |
| A Vision Centric Remote Sensing Benchmark | Mar 20, 2025 | Question AnsweringRepresentation Learning | —Unverified | 0 |
| OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence | Mar 20, 2025 | Instruction FollowingNatural Language Understanding | —Unverified | 0 |
| UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction | Mar 19, 2025 | NavigateSpatial Reasoning | —Unverified | 0 |
| Statistical applications of the 20/60/20 rule in risk management and portfolio optimization | Mar 19, 2025 | ManagementPortfolio Optimization | —Unverified | 0 |
| CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models | Mar 18, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks | Mar 14, 2025 | Spatial Reasoning | —Unverified | 0 |
| CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation | Mar 12, 2025 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios | Mar 10, 2025 | Image RestorationImage Super-Resolution | —Unverified | 0 |
| Navigating Motion Agents in Dynamic and Cluttered Environments through LLM Reasoning | Mar 10, 2025 | Autonomous NavigationMotion Generation | —Unverified | 0 |
| Towards Ambiguity-Free Spatial Foundation Model: Rethinking and Decoupling Depth Ambiguity | Mar 8, 2025 | Depth EstimationScene Understanding | CodeCode Available | 0 |
| An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning | Mar 7, 2025 | Conformal PredictionLanguage Modelling | —Unverified | 0 |
| ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment | Mar 4, 2025 | MinecraftSpatial Reasoning | —Unverified | 0 |
| FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks | Feb 25, 2025 | Image GenerationLayout Generation | CodeCode Available | 0 |
| VisFactor: Benchmarking Fundamental Visual Cognition in Multimodal Large Language Models | Feb 23, 2025 | BenchmarkingSpatial Reasoning | CodeCode Available | 0 |
| From Text to Space: Mapping Abstract Spatial Models in LLMs during a Grid-World Navigation Task | Feb 23, 2025 | Decision MakingNavigate | CodeCode Available | 0 |
| Mem2Ego: Empowering Vision-Language Models with Global-to-Ego Memory for Long-Horizon Embodied Navigation | Feb 20, 2025 | Decision MakingEfficient Exploration | —Unverified | 0 |
| Large Language Models and Mathematical Reasoning Failures | Feb 17, 2025 | Mathematical ReasoningPhysical Intuition | —Unverified | 0 |
| Large Language-Geometry Model: When LLM meets Equivariance | Feb 16, 2025 | modelSpatial Reasoning | —Unverified | 0 |
| STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning | Feb 14, 2025 | Decision MakingSpatial Reasoning | —Unverified | 0 |
| A Solver-Aided Hierarchical Language for LLM-Driven CAD Design | Feb 13, 2025 | Spatial Reasoning | —Unverified | 0 |
| Visual Agentic AI for Spatial Reasoning with a Dynamic API | Feb 10, 2025 | Program SynthesisSpatial Reasoning | —Unverified | 0 |