| When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | May 16, 2024 | In-Context LearningQuestion Answering | CodeCode Available | 7 |
| Generating Human Motion in 3D Scenes from Text Descriptions | May 13, 2024 | Motion GenerationObject | —Unverified | 0 |
| DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding | May 10, 2024 | RelationSpatial Reasoning | CodeCode Available | 1 |
| RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation | May 9, 2024 | Natural Language QueriesRobot Navigation | —Unverified | 0 |
| Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis | May 1, 2024 | Image CaptioningQuestion Answering | —Unverified | 0 |
| Re-Thinking Inverse Graphics With Large Language Models | Apr 23, 2024 | Language ModellingLarge Language Model | —Unverified | 0 |
| Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs | Apr 11, 2024 | DescriptiveHallucination | CodeCode Available | 0 |
| HAMMR: HierArchical MultiModal React agents for generic VQA | Apr 8, 2024 | Optical Character Recognition (OCR)Question Answering | —Unverified | 0 |
| Challenges Faced by Large Language Models in Solving Multi-Agent Flocking | Apr 6, 2024 | Decision MakingSpatial Reasoning | —Unverified | 0 |
| Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Apr 4, 2024 | Spatial ReasoningVisual Navigation | CodeCode Available | 1 |