SOTAVerified

Spatial Reasoning

Papers

Showing 51100 of 453 papers

TitleStatusHype
Act3D: 3D Feature Field Transformers for Multi-Task Robotic ManipulationCode2
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language ModelsCode2
ConceptFusion: Open-set Multimodal 3D MappingCode2
Warehouse Spatial Question Answering with LLM AgentCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD SoftwareCode1
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
Towards Visuospatial Cognition via Hierarchical Fusion of Visual ExpertsCode1
Visuospatial Cognitive AssistantCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
Unsupervised Visual Chain-of-Thought Reasoning via Preference OptimizationCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Grounded Chain-of-Thought for Multimodal Large Language ModelsCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
VISO-Grasp: Vision-Language Informed Spatial Object-centric 6-DoF Active View Planning and Grasping in Clutter and InvisibilityCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal ModelsCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic SegmentationCode1
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMsCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
Learning Action and Reasoning-Centric Image Editing from Videos and SimulationsCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
TopViewRS: Vision-Language Models as Top-View Spatial ReasonersCode1
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual GroundingCode1
Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language ModelsCode1
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent EnvironmentsCode1
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
What's "up" with vision-language models? Investigating their struggle with spatial reasoningCode1
Vision-Language Models are Zero-Shot Reward Models for Reinforcement LearningCode1
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal ReasoningCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.