SOTAVerified

Spatial Reasoning

Papers

Showing 201250 of 453 papers

TitleStatusHype
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark0
SAT: Dynamic Spatial Aptitude Training for Multimodal Language ModelsCode4
TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-ActionCode2
VideoSAVi: Self-Aligned Video Language Models without Human Supervision0
Can Large Language Models Reason about the Region Connection Calculus?Code0
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agentsCode0
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs0
CityWalker: Learning Embodied Urban Navigation from Web-Scale VideosCode3
APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World AgentsCode0
Probing the limitations of multimodal language models for chemistry and materials researchCode2
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation0
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics0
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous DrivingCode2
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning0
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting0
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality0
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-AnsweringCode2
GPT-4o System Card0
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction0
Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning0
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning0
Locality Alignment Improves Vision-Language ModelsCode2
Testing GPT-4-o1-preview on math and science problems: A follow-up study0
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning0
Structured Spatial Reasoning with Open Vocabulary Object Detectors0
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
Polymath: A Challenging Multi-modal Mathematical Reasoning BenchmarkCode0
Evaluation of Code LLMs on Geospatial Code GenerationCode0
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Models0
Social Conjuring: Multi-User Runtime Collaboration with AI in Building Virtual 3D Worlds0
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMsCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
Spatial Reasoning and Planning for Deep Embodied Agents0
DARE: Diverse Visual Question Answering with Robustness Evaluation0
Can Vision Language Models Learn from Visual Demonstrations of Ambiguous Spatial Reasoning?Code0
Tag Map: A Text-Based Map for Spatial Reasoning and Navigation with Large Language Models0
Enhancing Logical Reasoning in Large Language Models through Graph-based Synthetic DataCode0
Reasoning Paths with Reference Objects Elicit Quantitative Spatial Reasoning in Large Vision-Language Models0
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching0
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments0
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models0
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object SegmentationCode2
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications0
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning0
Show:102550
← PrevPage 5 of 10Next →

No leaderboard results yet.