SOTAVerified

Spatial Reasoning

Papers

Showing 301325 of 453 papers

TitleStatusHype
Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds0
Spatial Reasoning and Planning for Deep Embodied Agents0
DARE: Diverse Visual Question Answering with Robustness Evaluation0
Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?Code0
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models0
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic DataCode0
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models0
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models0
Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Narrowing the Gap between Vision and Action in NavigationCode0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
SceneGPT: A Language Model for 3D Scene Understanding0
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications0
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models0
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model0
I Know About "Up"! Enhancing Spatial Reasoning in Visual Language Models Through 3D Reconstruction0
OpenSU3D: Open World 3D Scene Understanding using Foundation Models0
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlayCode0
GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning0
FlowVQA: Mapping Multimodal Logic in Visual Question Answering with Flowcharts0
Show:102550
← PrevPage 13 of 19Next →

No leaderboard results yet.