SOTAVerified

Spatial Reasoning

Papers

Showing 51100 of 453 papers

TitleStatusHype
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPOCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D ScenesCode2
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
SPARE3D: A Dataset for SPAtial REasoning on Three-View Line DrawingsCode1
SPARTQA: A Textual Question Answering Benchmark for Spatial ReasoningCode1
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded DialoguesCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
Learning Action and Reasoning-Centric Image Editing from Videos and SimulationsCode1
Joint Spatio-Textual Reasoning for Answering Tourism QuestionsCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
SmartPlay: A Benchmark for LLMs as Intelligent AgentsCode1
IndoNLI: A Natural Language Inference Dataset for IndonesianCode1
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic LiftingCode1
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic SegmentationCode1
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual GroundingCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent EnvironmentsCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
Spatially Aware Multimodal Transformers for TextVQACode1
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
GuessWhat?! Visual object discovery through multi-modal dialogueCode1
SBEVNet: End-to-End Deep Stereo Layout EstimationCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal ModelsCode1
CLIPort: What and Where Pathways for Robotic ManipulationCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Grounded Chain-of-Thought for Multimodal Large Language ModelsCode1
Self-supervised Spatial Reasoning on Multi-View Line DrawingsCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based RepresentationCode1
Decoding Language Spatial Relations to 2D Spatial ArrangementsCode1
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.