SOTAVerified

Spatial Reasoning

Papers

Showing 51100 of 453 papers

TitleStatusHype
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery0
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial ReasoningCode0
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks0
SPaRC: A Spatial Pathfinding Reasoning ChallengeCode0
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought0
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?Code0
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation0
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation0
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement LearningCode2
SpatialScore: Towards Unified Evaluation for Multimodal Spatial UnderstandingCode2
SPhyR: Spatial-Physical Reasoning Benchmark on Material DistributionCode0
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning0
Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds0
Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation0
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning0
Visuospatial Cognitive AssistantCode1
Towards Visuospatial Cognition via Hierarchical Fusion of Visual ExpertsCode1
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging0
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?0
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision0
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model CapabilitiesCode2
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models0
SITE: towards Spatial Intelligence Thorough Evaluation0
Preliminary Explorations with GPT-4o(mni) Native Image Generation0
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors0
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models0
First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images0
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning0
Unsupervised Visual Chain-of-Thought Reasoning via Preference OptimizationCode1
A Review of 3D Object Detection with Vision-Language Models0
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language ModelsCode2
Spatial Reasoner: A 3D Inference Pipeline for XR Applications0
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs0
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative ReasonersCode2
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement LearningCode2
EarthGPT-X: Enabling MLLMs to Flexibly and Comprehensively Understand Multi-Source Remote Sensing Imagery0
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
Intelligence of Things: A Spatial Context-Aware Control System for Smart Devices0
LVLM_CSP: Accelerating Large Vision Language Models via Clustering, Scattering, and Pruning for Reasoning Segmentation0
Embodied World Models Emerge from Navigational Task in Open-Ended Environments0
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science0
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.