| OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection | Sep 30, 2024 | DiversityKeypoint Detection | CodeCode Available | 1 | 5 |
| Grounding Consistency: Distilling Spatial Common Sense for Precise Visual Relationship Detection | Jan 1, 2021 | Common Sense ReasoningGraph Generation | CodeCode Available | 1 | 5 |
| Self-supervised Spatial Reasoning on Multi-View Line Drawings | Apr 27, 2021 | Binary ClassificationContrastive Learning | CodeCode Available | 1 | 5 |
| GuessWhat?! Visual object discovery through multi-modal dialogue | Nov 23, 2016 | ObjectObject Discovery | CodeCode Available | 1 | 5 |
| Pix2Shape: Towards Unsupervised Learning of 3D Scenes from Images using a View-based Representation | Mar 23, 2020 | DecoderSpatial Reasoning | CodeCode Available | 1 | 5 |
| Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | Feb 5, 2025 | In-Context LearningLanguage Modeling | CodeCode Available | 1 | 5 |
| Open3DVQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space | Mar 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models | Mar 17, 2025 | Question AnsweringScene Understanding | CodeCode Available | 1 | 5 |
| Decoding Language Spatial Relations to 2D Spatial Arrangements | Nov 1, 2020 | Spatial Reasoning | CodeCode Available | 1 | 5 |
| End-to-End Egospheric Spatial Memory | Feb 15, 2021 | General Reinforcement LearningImitation Learning | CodeCode Available | 1 | 5 |