SOTAVerified

Spatial Reasoning

Papers

Showing 101150 of 453 papers

TitleStatusHype
Vision-Language Models are Zero-Shot Reward Models for Reinforcement LearningCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual SimulationsCode1
Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression RecognitionCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based RepresentationCode1
Unsupervised Visual Chain-of-Thought Reasoning via Preference OptimizationCode1
IndoNLI: A Natural Language Inference Dataset for IndonesianCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
Towards Visuospatial Cognition via Hierarchical Fusion of Visual ExpertsCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
Capturing Shape Information with Multi-Scale Topological Loss Terms for 3D ReconstructionCode1
Touchdown: Natural Language Navigation and Spatial Reasoning in Visual Street EnvironmentsCode1
Translating Natural Language to Planning Goals with Large-Language ModelsCode1
SBEVNet: End-to-End Deep Stereo Layout EstimationCode1
VideoCAD: A Large-Scale Video Dataset for Learning UI Interactions and 3D Reasoning from CAD SoftwareCode1
Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship DetectionCode1
Are Deep Neural Networks SMARTer than Second Graders?Code1
Teaching Agents how to Map: Spatial Reasoning for Multi-Object NavigationCode1
StepGame: A New Benchmark for Robust Multi-Hop Spatial Reasoning in TextsCode1
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation RecognitionCode1
Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous DrivingCode1
Spatially Aware Multimodal Transformers for TextVQACode1
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
End-to-End Egospheric Spatial MemoryCode1
Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal ReasoningCode1
Geospatial Mechanistic Interpretability of Large Language ModelsCode1
GuessWhat?! Visual object discovery through multi-modal dialogueCode1
TopViewRS: Vision-Language Models as Top-View Spatial ReasonersCode1
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMsCode1
Visual Spatial ReasoningCode1
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific ApplicationsCode1
From Seeing to Doing: Bridging Reasoning and Decision for Robotic ManipulationCode1
SpaceNLI: Evaluating the Consistency of Predicting Inferences in SpaceCode0
SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language ModelsCode0
SORNet: Spatial Object-Centric Representations for Sequential ManipulationCode0
SPaRC: A Spatial Pathfinding Reasoning ChallengeCode0
Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial ReasoningCode0
EgoHumans: An Egocentric 3D Multi-Human BenchmarkCode0
Representation Learning for Grounded Spatial ReasoningCode0
Show, Don't Tell: Evaluating Large Language Models Beyond Textual Understanding with ChildPlayCode0
Polymath: A Challenging Multi-modal Mathematical Reasoning BenchmarkCode0
Disentangling Extraction and Reasoning in Multi-hop Spatial ReasoningCode0
OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene UnderstandingCode0
DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in TextCode0
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal DistillationCode0
3D CoCa: Contrastive Learners are 3D CaptionersCode0
DeepSSN: a deep convolutional neural network to assess spatial scene similarityCode0
No Blind Spots: Full-Surround Multi-Object Tracking for Autonomous Vehicles using Cameras & LiDARsCode0
Show:102550
← PrevPage 3 of 10Next →

No leaderboard results yet.