SOTAVerified

Spatial Reasoning

Papers

Showing 301350 of 453 papers

TitleStatusHype
Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds0
Spatial Reasoning and Planning for Deep Embodied Agents0
DARE: Diverse Visual Question Answering with Robustness Evaluation0
Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?Code0
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models0
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic DataCode0
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models0
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models0
Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Narrowing the Gap between Vision and Action in NavigationCode0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
SceneGPT: A Language Model for 3D Scene Understanding0
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications0
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model0
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction0
OpenSU3D: Open World 3D Scene Understanding using Foundation Models0
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlayCode0
GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities0
GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs0
Neuro-symbolic Training for Reasoning over Spatial LanguageCode0
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences0
RoboPoint: A Vision-Language Model for Spatial Affordance Prediction for Robotics0
SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language ModelsCode0
Quantifying Geospatial in the Common Crawl Corpus0
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models0
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative ReasoningCode0
Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks?0
Generating Human Motion in 3D Scenes from Text Descriptions0
RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Re-Thinking Inverse Graphics With Large Language Models0
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMsCode0
HAMMR: HierArchical MultiModal React agents for generic VQA0
Challenges Faced by Large Language Models in Solving Multi-Agent Flocking0
Grounding Spatial Relations in Text-Only Language ModelsCode0
SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors0
JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection0
DivCon: Divide and Conquer for Progressive Text-to-Image Generation0
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training0
A Surprising Failure? Multimodal LLMs and the NLVR Challenge0
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models0
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs0
Show:102550
← PrevPage 7 of 10Next →

No leaderboard results yet.