| An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-8 | Sep 27, 2023 | Spatial Reasoning | —Unverified | 0 |
| Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation | Sep 20, 2023 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 0 |
| Multi-camera Bird's Eye View Perception for Autonomous Driving | Sep 16, 2023 | Autonomous DrivingSensor Fusion | —Unverified | 0 |
| STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning | Sep 13, 2023 | RelationRelationship Detection | CodeCode Available | 0 |
| DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions | Sep 7, 2023 | PositionSpatial Reasoning | CodeCode Available | 1 |
| Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | Aug 24, 2023 | Chart Question AnsweringFS-MEVQA | CodeCode Available | 5 |
| BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions | Aug 19, 2023 | MMEOptical Character Recognition (OCR) | CodeCode Available | 2 |
| Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models | Aug 18, 2023 | Image-text matchingObject Localization | —Unverified | 0 |
| Chat-3D: Data-efficiently Tuning Large Language Model for Universal Dialogue of 3D Scenes | Aug 17, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Object Goal Navigation with Recursive Implicit Maps | Aug 10, 2023 | NavigateObject | —Unverified | 0 |
| Spatial Intelligence of a Self-driving Car and Rule-Based Decision Making | Aug 2, 2023 | Autonomous DrivingDecision Making | —Unverified | 0 |
| SpaceNLI: Evaluating the Consistency of Predicting Inferences in Space | Jul 5, 2023 | Natural Language InferenceNegation | CodeCode Available | 0 |
| Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation | Jun 30, 2023 | Action DetectionPose Prediction | CodeCode Available | 2 |
| A Universal Semantic-Geometric Representation for Robotic Manipulation | Jun 18, 2023 | 3D geometryRobot Manipulation | CodeCode Available | 1 |
| Controllable Text-to-Image Generation with GPT-4 | May 29, 2023 | Image GenerationInstruction Following | —Unverified | 0 |
| Neural Task Synthesis for Visual Programming | May 26, 2023 | Imitation LearningSpatial Reasoning | CodeCode Available | 0 |
| Improved Algorithms for Allen's Interval Algebra by Dynamic Programming with Sublinear Partitioning | May 25, 2023 | Spatial Reasoning | —Unverified | 0 |
| EgoHumans: An Egocentric 3D Multi-Human Benchmark | May 25, 2023 | 3D Pose EstimationHuman Detection | CodeCode Available | 0 |
| LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models | May 23, 2023 | Common Sense ReasoningImage Generation | CodeCode Available | 2 |
| From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations | May 21, 2023 | Contrastive LearningLinear evaluation | —Unverified | 0 |
| Contextual Reasoning for Scene Generation (Technical Report) | May 3, 2023 | Scene GenerationSpatial Reasoning | —Unverified | 0 |
| Dialectical language model evaluation: An initial appraisal of the commonsense spatial reasoning abilities of LLMs | Apr 22, 2023 | Language Model EvaluationLanguage Modeling | —Unverified | 0 |
| MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | Apr 20, 2023 | Image DescriptionLanguage Modelling | CodeCode Available | 7 |
| Visual Instruction Tuning | Apr 17, 2023 | 1 Image, 2*2 Stitching3D Question Answering (3D-QA) | CodeCode Available | 6 |
| Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs | Mar 22, 2023 | AllSpatial Reasoning | CodeCode Available | 0 |