| SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models | Jun 7, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| TopViewRS: Vision-Language Models as Top-View Spatial Reasoners | Jun 4, 2024 | Multiple-choiceSpatial Reasoning | CodeCode Available | 1 |
| SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models | Jun 3, 2024 | Language ModellingSpatial Reasoning | —Unverified | 0 |
| Reframing Spatial Reasoning Evaluation in Language Models: A Real-World Simulation Benchmark for Qualitative Reasoning | May 23, 2024 | Logical Reasoning Question AnsweringSpatial Reasoning | CodeCode Available | 0 |
| Can Large Language Models Create New Knowledge for Spatial Reasoning Tasks? | May 23, 2024 | Spatial Reasoning | —Unverified | 0 |
| When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | May 16, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 7 |
| Generating Human Motion in 3D Scenes from Text Descriptions | May 13, 2024 | Motion GenerationObject | —Unverified | 0 |
| DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding | May 10, 2024 | RelationSpatial Reasoning | CodeCode Available | 1 |
| RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation | May 9, 2024 | Natural Language QueriesRobot Navigation | —Unverified | 0 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Re-Thinking Inverse Graphics With Large Language Models | Apr 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Apr 11, 2024 | DescriptiveHallucination | CodeCode Available | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Challenges Faced by Large Language Models in Solving Multi-Agent Flocking | Apr 6, 2024 | Decision MakingSpatial Reasoning | —Unverified | 0 |
| Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Apr 4, 2024 | Spatial ReasoningVisual Navigation | CodeCode Available | 1 |
| Getting it Right: Improving Spatial Consistency in Text-to-Image Models | Apr 1, 2024 | Spatial Reasoning | CodeCode Available | 2 |
| Grounding Spatial Relations in Text-Only Language Models | Mar 20, 2024 | Spatial Reasoning | CodeCode Available | 0 |
| SpatialPIN: Enhancing Spatial Reasoning Capabilities of Vision-Language Models through Prompting and Interacting 3D Priors | Mar 18, 2024 | HallucinationMotion Planning | —Unverified | 0 |
| JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection | Mar 12, 2024 | Motion CompensationMoving Object Detection | —Unverified | 0 |
| DivCon: Divide and Conquer for Progressive Text-to-Image Generation | Mar 11, 2024 | Image GenerationLayout-to-Image Generation | —Unverified | 0 |
| Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training | Mar 4, 2024 | MathPhrase Grounding | —Unverified | 0 |
| A Surprising Failure? Multimodal LLMs and the NLVR Challenge | Feb 26, 2024 | SentenceSpatial Reasoning | —Unverified | 0 |
| LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments | Feb 26, 2024 | Spatial Reasoning | CodeCode Available | 1 |
| DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models | Feb 19, 2024 | Autonomous DrivingScene Understanding | —Unverified | 0 |
| PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | Feb 12, 2024 | Instruction FollowingLogical Reasoning | —Unverified | 0 |