SOTAVerified

Spatial Reasoning

Papers

Showing 5175 of 453 papers

TitleStatusHype
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D ScenesCode2
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language ModelsCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded DialoguesCode1
GuessWhat?! Visual object discovery through multi-modal dialogueCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
Grounded Chain-of-Thought for Multimodal Large Language ModelsCode1
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
End-to-End Egospheric Spatial MemoryCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
Show:102550
← PrevPage 3 of 19Next →

No leaderboard results yet.