SOTAVerified

Spatial Reasoning

Papers

Showing 2650 of 453 papers

TitleStatusHype
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing0
RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics0
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models0
ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment0
In-the-wild Audio Spatialization with Flexible Text-guided LocalizationCode0
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces0
Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames0
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD SoftwareCode1
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry PriorsCode0
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence0
Grounded Reinforcement Learning for Visual ReasoningCode0
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing TasksCode2
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence0
ChatVLA-2: Vision-Language-Action Model with Open-World Embodied Reasoning from Pretrained KnowledgeCode1
VLM Can Be a Good Assistant: Enhancing Embodied Visual Tracking with Self-Improving Vision-Language Models0
Jigsaw-Puzzles: From Seeing to Understanding to Reasoning in Vision-Language Models0
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D ReconstructionCode3
MEBench: A Novel Benchmark for Understanding Mutual Exclusivity Bias in Vision-Language Models0
Agentic 3D Scene Generation with Spatially Contextualized VLMs0
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers0
Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps0
U2-BENCH: Benchmarking Large Vision-Language Models on Ultrasound Understanding0
Show:102550
← PrevPage 2 of 19Next →

No leaderboard results yet.