SOTAVerified

Spatial Reasoning

Papers

Showing 226250 of 453 papers

TitleStatusHype
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science0
A Symbolic Representation of Human Posture for Interpretable Learning and Reasoning0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features0
A Vision Centric Remote Sensing Benchmark0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios0
Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization0
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way0
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs0
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
CASPER: Cognitive Architecture for Social Perception and Engagement in Robots0
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding0
Challenge of Spatial Cognition for Deep Learning0
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking0
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise0
Show:102550
← PrevPage 10 of 19Next →

No leaderboard results yet.