SOTAVerified

Spatial Reasoning

Papers

Showing 151175 of 453 papers

TitleStatusHype
Leveraging LLMs for Mission Planning in Precision Agriculture0
A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing0
PhyBlock: A Progressive Benchmark for Physical Understanding and Planning via 3D Block Assembly0
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning0
From Objects to Anywhere: A Holistic Benchmark for Multi-level Visual Grounding in 3D Scenes0
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing0
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics0
ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment0
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models0
In-the-wild Audio Spatialization with Flexible Text-guided LocalizationCode0
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces0
Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames0
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence0
Grounded Reinforcement Learning for Visual Reasoning0
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence0
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models0
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models0
Agentic 3D Scene Generation with Spatially Contextualized VLMs0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
Towards Dynamic 3D Reconstruction of Hand-Instrument Interaction in Ophthalmic Surgery0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial ReasoningCode0
Show:102550
← PrevPage 7 of 19Next →

No leaderboard results yet.