SOTAVerified

Spatial Reasoning

Papers

Showing 76100 of 453 papers

TitleStatusHype
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
GuessWhat?! Visual object discovery through multi-modal dialogueCode1
SBEVNet: End-to-End Deep Stereo Layout EstimationCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal ModelsCode1
CLIPort: What and Where Pathways for Robotic ManipulationCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Grounded Chain-of-Thought for Multimodal Large Language ModelsCode1
Self-supervised Spatial Reasoning on Multi-View Line DrawingsCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based RepresentationCode1
Decoding Language Spatial Relations to 2D Spatial ArrangementsCode1
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Show:102550
← PrevPage 4 of 19Next →

No leaderboard results yet.