SOTAVerified

Spatial Reasoning

Papers

Showing 301350 of 453 papers

TitleStatusHype
A Survey for Foundation Models in Autonomous Driving0
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation dataCode0
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities0
Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imageryCode2
StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments0
Distortions in Judged Spatial Relations in Large Language Models0
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
Location Aware Modular Biencoder for Tourism Question AnsweringCode0
LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding0
Hierarchical Spatio-temporal Decoupling for Text-to-Video GenerationCode0
Inherent limitations of LLMs regarding spatial informationCode0
Exploring and Improving the Spatial Reasoning Abilities of Large Language Models0
FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models0
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous DrivingCode2
What's "up" with vision-language models? Investigating their struggle with spatial reasoningCode1
Disentangling Extraction and Reasoning in Multi-hop Spatial ReasoningCode0
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in TextCode0
Vision-Language Models are Zero-Shot Reward Models for Reinforcement LearningCode1
Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning0
Integrating Symbolic Reasoning into Neural Generative Models for Design Generation0
SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics0
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal ReasoningCode1
Improved Baselines with Visual Instruction TuningCode6
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
SmartPlay: A Benchmark for LLMs as Intelligent AgentsCode1
An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-80
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal DistillationCode0
Multi-camera Bird's Eye View Perception for Autonomous Driving0
STUPD: A Synthetic Dataset for Spatial and Temporal Relation ReasoningCode0
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped PositionsCode1
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsCode2
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models0
Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D ScenesCode2
Object Goal Navigation with Recursive Implicit Maps0
Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making0
SpaceNLI: Evaluating the Consistency of Predicting Inferences in SpaceCode0
Act3D: 3D Feature Field Transformers for Multi-Task Robotic ManipulationCode2
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
Controllable Text-to-Image Generation with GPT-40
Neural Task Synthesis for Visual ProgrammingCode0
Improved Algorithms for Allen's Interval Algebra by Dynamic Programming with Sublinear Partitioning0
EgoHumans: An Egocentric 3D Multi-Human BenchmarkCode0
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language ModelsCode2
From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations0
Contextual Reasoning for Scene Generation (Technical Report)0
Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs0
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Visual Instruction TuningCode6
Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMsCode0
Show:102550
← PrevPage 7 of 10Next →

No leaderboard results yet.