SOTAVerified

Spatial Reasoning

Papers

Showing 125 of 453 papers

TitleStatusHype
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language ModelsCode7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Improved Baselines with Visual Instruction TuningCode6
Visual Instruction TuningCode6
GPT-4 Technical ReportCode6
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Sonata: Self-Supervised Learning of Reliable Point RepresentationsCode4
Factorio Learning EnvironmentCode4
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall SpacesCode4
SAT: Dynamic Spatial Aptitude Training for Multimodal Language ModelsCode4
PointVLA: Injecting the 3D World into Vision-Language-Action ModelsCode4
Video-R1: Reinforcing Video Reasoning in MLLMsCode4
SpatialBot: Precise Spatial Understanding with Vision Language ModelsCode3
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language ModelsCode3
MetaSpatial: Reinforcing 3D Spatial Reasoning in VLMs for the MetaverseCode3
CityWalker: Learning Embodied Urban Navigation from Web-Scale VideosCode3
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object ManipulationCode3
VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D ReconstructionCode3
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative ReasonersCode2
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies AheadCode2
Introducing Visual Perception Token into Multimodal Large Language ModelCode2
ConceptFusion: Open-set Multimodal 3D MappingCode2
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPOCode2
Imagine while Reasoning in Space: Multimodal Visualization-of-ThoughtCode2
IRef-VLA: A Benchmark for Interactive Referential Grounding with Imperfect Language in 3D ScenesCode2
Show:102550
← PrevPage 1 of 19Next →

No leaderboard results yet.