SOTAVerified

Spatial Reasoning

Papers

Showing 101150 of 453 papers

TitleStatusHype
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
SmartPlay: A Benchmark for LLMs as Intelligent AgentsCode1
DropPos: Pre-Training Vision Transformers by Reconstructing Dropped PositionsCode1
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
Translating Natural Language to Planning Goals with Large-Language ModelsCode1
Are Deep Neural Networks SMARTer than Second Graders?Code1
Visual Spatial ReasoningCode1
StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in TextsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D ReconstructionCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
IndoNLI: A Natural Language Inference Dataset for IndonesianCode1
CLIPort: What and Where Pathways for Robotic ManipulationCode1
Teaching Agents how to Map: Spatial Reasoning for Multi-Object NavigationCode1
SPARTQA: A Textual Question Answering Benchmark for Spatial ReasoningCode1
SBEVNet: End-to-End Deep Stereo Layout EstimationCode1
Self-supervised Spatial Reasoning on Multi-View Line DrawingsCode1
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
End-to-End Egospheric Spatial MemoryCode1
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
Decoding Language Spatial Relations to 2D Spatial ArrangementsCode1
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded DialoguesCode1
Joint Spatio-Textual Reasoning for Answering Tourism QuestionsCode1
Spatially Aware Multimodal Transformers for TextVQACode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic LiftingCode1
SPARE3D: A Dataset for SPAtial REasoning on Three-View Line DrawingsCode1
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based RepresentationCode1
VSGNet: Spatial Attention Network for Detecting Human Object Interactions Using Graph ConvolutionsCode1
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation RecognitionCode1
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street EnvironmentsCode1
GuessWhat?! Visual object discovery through multi-modal dialogueCode1
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning0
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments0
ByDeWay: Boost Your multimodal LLM with DEpth prompting in a Training-Free Way0
M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning0
Scaling RL to Long VideosCode0
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene UnderstandingCode0
A Neural Representation Framework with LLM-Driven Spatial Reasoning for Open-Vocabulary 3D Visual Grounding0
Optimising Language Models for Downstream Tasks: A Post-Training Perspective0
ImplicitQA: Going beyond frames towards Implicit Video ReasoningCode0
World-aware Planning Narratives Enhance Large Vision-Language Model Planner0
ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models0
From 2D to 3D Cognition: A Brief Survey of General World Models0
Video Perception Models for 3D Scene Synthesis0
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies0
SIRI-Bench: Challenging VLMs' Spatial Intelligence through Complex Reasoning Tasks0
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.