SOTAVerified

Spatial Reasoning

Papers

Showing 51100 of 453 papers

TitleStatusHype
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative ReasonersCode2
SpaceR: Reinforcing MLLMs in Video Spatial ReasoningCode2
ThinkGeo: Evaluating Tool-Augmented Agents for Remote Sensing TasksCode2
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame BenchmarkCode1
SPARTQA: A Textual Question Answering Benchmark for Spatial ReasoningCode1
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded DialoguesCode1
SmartFreeEdit: Mask-Free Spatial-Aware Image Editing with Complex Instruction UnderstandingCode1
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic LiftingCode1
SmartPlay: A Benchmark for LLMs as Intelligent AgentsCode1
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context PromptingCode1
SPARE3D: A Dataset for SPAtial REasoning on Three-View Line DrawingsCode1
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal ModelsCode1
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression ComprehensionCode1
Revisiting spatio-temporal layouts for compositional action recognitionCode1
SBEVNet: End-to-End Deep Stereo Layout EstimationCode1
A Universal Semantic-Geometric Representation for Robotic ManipulationCode1
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Multimodal ModelsCode1
Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based RepresentationCode1
DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual GroundingCode1
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under AmbiguitiesCode1
Seeing is Not Reasoning: MVPBench for Graph-based Evaluation of Multi-path Visual Physical CoTCode1
Spatially Aware Multimodal Transformers for TextVQACode1
Self-supervised Spatial Reasoning on Multi-View Line DrawingsCode1
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language ModelsCode1
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Joint Spatio-Textual Reasoning for Answering Tourism QuestionsCode1
Learning Action and Reasoning-Centric Image Editing from Videos and SimulationsCode1
CLIPort: What and Where Pathways for Robotic ManipulationCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent EnvironmentsCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
Decoding Language Spatial Relations to 2D Spatial ArrangementsCode1
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic SegmentationCode1
Show:102550
← PrevPage 2 of 10Next →

No leaderboard results yet.