SOTAVerified

Spatial Reasoning

Papers

Showing 201225 of 453 papers

TitleStatusHype
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark0
SAT: Dynamic Spatial Aptitude Training for Multimodal Language ModelsCode4
TACO: Learning Multi-modal Action Models with Synthetic Chains-of-Thought-and-ActionCode2
VideoSAVi: Self-Aligned Video Language Models without Human Supervision0
Can Large Language Models Reason about the Region Connection Calculus?Code0
Grid-augmented vision: A simple yet effective approach for enhanced spatial understanding in multi-modal agentsCode0
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs0
CityWalker: Learning Embodied Urban Navigation from Web-Scale VideosCode3
APT: Architectural Planning and Text-to-Blueprint Construction Using Large Language Models for Open-World AgentsCode0
Probing the limitations of multimodal language models for chemistry and materials researchCode2
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation0
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics0
DriveMLLM: A Benchmark for Spatial Understanding with Multimodal Large Language Models in Autonomous DrivingCode2
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Visual-Linguistic Agent: Towards Collaborative Contextual Object Reasoning0
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting0
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality0
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-AnsweringCode2
GPT-4o System Card0
Geometric Feature Enhanced Knowledge Graph Embedding and Spatial Reasoning0
Where Am I and What Will I See: An Auto-Regressive Model for Spatial Localization and View Prediction0
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
Sparkle: Mastering Basic Spatial Capabilities in Vision Language Models Elicits Generalization to Composite Spatial Reasoning0
Show:102550
← PrevPage 9 of 19Next →

No leaderboard results yet.