SOTAVerified

Spatial Reasoning

Papers

Showing 201250 of 453 papers

TitleStatusHype
Advancing Egocentric Video Question Answering with Multimodal Large Language Models0
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations0
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning0
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models0
Aether: Geometric-Aware Unified World Modeling0
Agentic 3D Scene Generation with Spatially Contextualized VLMs0
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality0
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning0
A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing0
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning0
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding0
An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-80
A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning0
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs0
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting0
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?0
A Review of 3D Object Detection with Vision-Language Models0
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)0
A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings0
A Solver-Aided Hierarchical Language for LLM-Driven CAD Design0
ASPMT(QS): Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo Theories0
A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World0
A Surprising Failure? Multimodal LLMs and the NLVR Challenge0
A Survey for Foundation Models in Autonomous Driving0
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science0
A Symbolic Representation of Human Posture for Interpretable Learning and Reasoning0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features0
A Vision Centric Remote Sensing Benchmark0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios0
Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization0
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way0
CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs0
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?0
Can Large Multimodal Models Understand Agricultural Scenes? Benchmarking with AgroMind0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
CASPER: Cognitive Architecture for Social Perception and Engagement in Robots0
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding0
Challenge of Spatial Cognition for Deep Learning0
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking0
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
Combining Deep Learning and Qualitative Spatial Reasoning to Learn Complex Structures from Sparse Examples with Noise0
Show:102550
← PrevPage 5 of 10Next →

No leaderboard results yet.