SOTAVerified

Spatial Reasoning

Papers

Showing 151200 of 453 papers

TitleStatusHype
Leveraging LLMs for Mission Planning in Precision Agriculture0
A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing0
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly0
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning0
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes0
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics0
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing0
ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment0
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models0
In-the-wild Audio Spatialization with Flexible Text-guided LocalizationCode0
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames0
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces0
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence0
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence0
Grounded Reinforcement Learning for Visual ReasoningCode0
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models0
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models0
MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models0
Agentic 3D Scene Generation with Spatially Contextualized VLMs0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial ReasoningCode0
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation0
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought0
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?Code0
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation0
SPaRC: A Spatial Pathfinding Reasoning ChallengeCode0
MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks0
STAR-R1: Spacial TrAnsformation Reasoning by Reinforcing Multimodal LLMsCode0
ReGUIDE: Data Efficient GUI Grounding via Spatial Reasoning and Search0
SPhyR: Spatial-Physical Reasoning Benchmark on Material DistributionCode0
Towards Embodied Cognition in Robots via Spatially Grounded Synthetic Worlds0
From Templates to Natural Language: Generalization Challenges in Instruction-Tuned LLMs for Spatial Reasoning0
Incentivizing Multimodal Reasoning in Large Models for Direct Robot Manipulation0
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?0
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging0
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision0
SpatialPrompting: Keyframe-driven Zero-Shot Spatial Reasoning with Off-the-Shelf Multimodal Large Language Models0
SITE: towards Spatial Intelligence Thorough Evaluation0
Preliminary Explorations with GPT-4o(mni) Native Image Generation0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
FreeInsert: Disentangled Text-Guided Object Insertion in 3D Gaussian Scene without Spatial Priors0
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models0
First Order Logic with Fuzzy Semantics for Describing and Recognizing Nerves in Medical Images0
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning0
Show:102550
← PrevPage 4 of 10Next →

No leaderboard results yet.