SOTAVerified

Spatial Reasoning

Papers

Showing 5175 of 453 papers

TitleStatusHype
Embodied-R: Collaborative Framework for Activating Embodied Spatial Reasoning in Foundation Models via Reinforcement LearningCode2
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing TasksCode2
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-AnsweringCode2
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded DialoguesCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
Learning Action and Reasoning-Centric Image Editing from Videos and SimulationsCode1
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
Joint Spatio-Textual Reasoning for Answering Tourism QuestionsCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent EnvironmentsCode1
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Show:102550
← PrevPage 3 of 19Next →

No leaderboard results yet.