SOTAVerified

Spatial Reasoning

Papers

Showing 251300 of 453 papers

TitleStatusHype
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences0
World-aware Planning Narratives Enhance Large Vision-Language Model Planner0
Perturbed State Space Feature Encoders for Optical Flow with Event Cameras0
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models0
A Light and Smart Wearable Platform with Multimodal Foundation Model for Enhanced Spatial Reasoning in People with Blindness and Low Vision0
Leveraging LLMs for Mission Planning in Precision Agriculture0
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow0
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark0
A Call for New Recipes to Enhance Spatial Reasoning in MLLMs0
ActionFlow: Equivariant, Accurate, and Efficient Policies with Spatially Symmetric Flow Matching0
Space-LLaVA: a Vision-Language Model Adapted to Extraterrestrial Applications0
A dual contrastive framework0
Advancing Egocentric Video Question Answering with Multimodal Large Language Models0
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations0
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning0
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models0
Aether: Geometric-Aware Unified World Modeling0
Agentic 3D Scene Generation with Spatially Contextualized VLMs0
AI's Spatial Intelligence: Evaluating AI's Understanding of Spatial Transformations in PSVT:R and Augmented Reality0
A LLM Benchmark based on the Minecraft Builder Dialog Agent Task0
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning0
A Multi-Modal Spatial Risk Framework for EV Charging Infrastructure Using Remote Sensing0
An Empirical Study of Conformal Prediction in LLM with ASP Scaffolds for Robust Reasoning0
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding0
An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-80
A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning0
Dspy-based Neural-Symbolic Pipeline to Enhance Spatial Reasoning in LLMs0
Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting0
Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?0
A Review of 3D Object Detection with Vision-Language Models0
A Schema-Guided Reason-while-Retrieve framework for Reasoning on Scene Graphs with Large-Language-Models (LLMs)0
A Self-Supervised Auxiliary Loss for Deep RL in Partially Observable Settings0
A Solver-Aided Hierarchical Language for LLM-Driven CAD Design0
ASPMT(QS): Non-Monotonic Spatial Reasoning with Answer Set Programming Modulo Theories0
A Spoken Dialogue System for Spatial Question Answering in a Physical Blocks World0
A Surprising Failure? Multimodal LLMs and the NLVR Challenge0
A Survey for Foundation Models in Autonomous Driving0
A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science0
A Symbolic Representation of Human Posture for Interpretable Learning and Reasoning0
Atari-GPT: Benchmarking Multimodal Large Language Models as Low-Level Policies in Atari Games0
AuxDepthNet: Real-Time Monocular 3D Object Detection with Depth-Sensitive Features0
A Vision Centric Remote Sensing Benchmark0
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models0
Beyond Semantics: Rediscovering Spatial Awareness in Vision-Language Models0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios0
Bridging Visualization and Optimization: Multimodal Large Language Models on Graph-Structured Combinatorial Optimization0
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way0
Show:102550
← PrevPage 6 of 10Next →

No leaderboard results yet.