SOTAVerified

Spatial Reasoning

Papers

Showing 101150 of 453 papers

TitleStatusHype
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph ConvolutionsCode1
Warehouse Spatial Question Answering with LLM AgentCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
IndoNLI: A Natural Language Inference Dataset for IndonesianCode1
Towards Visuospatial Cognition via Hierarchical Fusion of Visual ExpertsCode1
TopViewRS: Vision-Language Models as Top-View Spatial ReasonersCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D ReconstructionCode1
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street EnvironmentsCode1
Translating Natural Language to Planning Goals with Large-Language ModelsCode1
Unsupervised Visual Chain-of-Thought Reasoning via Preference OptimizationCode1
Teaching Agents how to Map: Spatial Reasoning for Multi-Object NavigationCode1
Are Deep Neural Networks SMARTer than Second Graders?Code1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation RecognitionCode1
StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in TextsCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
SPARTQA: A Textual Question Answering Benchmark for Spatial ReasoningCode1
Spatially Aware Multimodal Transformers for TextVQACode1
End-to-End Egospheric Spatial MemoryCode1
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal ReasoningCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
SPARE3D: A Dataset for SPAtial REasoning on Three-View Line DrawingsCode1
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD SoftwareCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlayCode0
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial ReasoningCode0
Scaling RL to Long VideosCode0
SORNet: Spatial Object-Centric Representations for Sequential ManipulationCode0
EgoHumans: An Egocentric 3D Multi-Human BenchmarkCode0
Representation Learning for Grounded Spatial ReasoningCode0
Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative ReasoningCode0
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene UnderstandingCode0
Disentangling Extraction and Reasoning in Multi-hop Spatial ReasoningCode0
Neuro-symbolic Training for Reasoning over Spatial LanguageCode0
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?Code0
No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & LiDARsCode0
Polymath: A Challenging Multi-modal Mathematical Reasoning BenchmarkCode0
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in TextCode0
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal DistillationCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.