SOTAVerified

Spatial Reasoning

Papers

Showing 76100 of 453 papers

TitleStatusHype
Multi-scale GCN-assisted two-stage network for joint segmentation of retinal layers and disc in peripapillary OCT imagesCode1
Logic-RAG: Augmenting Large Multimodal Models with Visual-Spatial Knowledge for Road Scene UnderstandingCode1
AlanaVLM: A Multimodal Embodied AI Foundation Model for Egocentric Video UnderstandingCode1
Long Range Arena: A Benchmark for Efficient TransformersCode1
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language ModelsCode1
Joint Spatio-Textual Reasoning for Answering Tourism QuestionsCode1
Learning Action and Reasoning-Centric Image Editing from Videos and SimulationsCode1
CLIPort: What and Where Pathways for Robotic ManipulationCode1
iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMsCode1
Learning and Reasoning with the Graph Structure Representation in Robotic SurgeryCode1
Knot So Simple: A Minimalistic Environment for Spatial ReasoningCode1
ING-VP: MLLMs cannot Play Easy Vision-based Games YetCode1
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global MemoryCode1
CoNav: Collaborative Cross-Modal Reasoning for Embodied NavigationCode1
CityGPT: Empowering Urban Spatial Cognition of Large Language ModelsCode1
3D-Aware Vision-Language Models Fine-Tuning with Geometric DistillationCode1
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent EnvironmentsCode1
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and GeneralizabilityCode1
MineAnyBuild: Benchmarking Spatial Planning for Open-world AI AgentsCode1
CityEQA: A Hierarchical LLM Agent on Embodied Question Answering Benchmark in City SpaceCode1
Improved Visual-Spatial Reasoning via R1-Zero-Like TrainingCode1
Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open SpaceCode1
OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint DetectionCode1
Decoding Language Spatial Relations to 2D Spatial ArrangementsCode1
HSPFormer: Hierarchical Spatial Perception Transformer for Semantic SegmentationCode1
Show:102550
← PrevPage 4 of 19Next →

No leaderboard results yet.