SOTAVerified

Spatial Reasoning

Papers

Showing 301325 of 453 papers

TitleStatusHype
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios0
Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization0
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way0
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs0
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
CASPER: Cognitive Architecture for Social Perception and Engagement in Robots0
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding0
Challenge of Spatial Cognition for Deep Learning0
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking0
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise0
Commonsense Spatial Reasoning for Visually Intelligent Agents0
Commonsense Visual Sensemaking for Autonomous Driving: On Generalised Neurosymbolic Online Abduction Integrating Vision and Semantics0
Complexity Classification in Infinite-Domain Constraint Satisfaction0
Contextual Reasoning for Scene Generation (Technical Report)0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
Controllable Text-to-Image Generation with GPT-40
DARE: Diverse Visual Question Answering with Robustness Evaluation0
Show:102550
← PrevPage 13 of 19Next →

No leaderboard results yet.