| A Survey for Foundation Models in Autonomous Driving | Feb 2, 2024 | 3D Object DetectionAutonomous Driving | —Unverified | 0 |
| Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data | Jan 31, 2024 | BenchmarkingChange Detection | CodeCode Available | 0 |
| SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | Jan 22, 2024 | Question AnsweringSpatial Reasoning | —Unverified | 0 |
| Seeing the roads through the trees: A benchmark for modeling spatial dependencies with aerial imagery | Jan 12, 2024 | Object RecognitionRoad Segmentation | CodeCode Available | 2 |
| StarCraftImage: A Dataset For Prototyping Spatial Reasoning Methods For Multi-Agent Environments | Jan 9, 2024 | ImputationReinforcement Learning (RL) | —Unverified | 0 |
| Distortions in Judged Spatial Relations in Large Language Models | Jan 8, 2024 | MisconceptionsSpatial Reasoning | —Unverified | 0 |
| Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | Jan 8, 2024 | Relation MappingSpatial Reasoning | CodeCode Available | 1 |
| Location Aware Modular Biencoder for Tourism Question Answering | Jan 4, 2024 | Question AnsweringRetrieval | CodeCode Available | 0 |
| LiDAR-LLM: Exploring the Potential of Large Language Models for 3D LiDAR Understanding | Dec 21, 2023 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation | Dec 7, 2023 | Spatial ReasoningText-to-Video Generation | —Unverified | 0 |
| Inherent limitations of LLMs regarding spatial information | Dec 5, 2023 | Spatial Reasoning | CodeCode Available | 0 |
| Exploring and Improving the Spatial Reasoning Abilities of Large Language Models | Dec 2, 2023 | Spatial Reasoning | —Unverified | 0 |
| FollowEval: A Multi-Dimensional Benchmark for Assessing the Instruction-Following Capability of Large Language Models | Nov 16, 2023 | Instruction FollowingLogical Reasoning | —Unverified | 0 |
| On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | Nov 9, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| What's "up" with vision-language models? Investigating their struggle with spatial reasoning | Oct 30, 2023 | Spatial Reasoning | CodeCode Available | 1 |
| Disentangling Extraction and Reasoning in Multi-hop Spatial Reasoning | Oct 25, 2023 | Spatial Reasoning | CodeCode Available | 0 |
| DepWiGNN: A Depth-wise Graph Neural Network for Multi-hop Spatial Reasoning in Text | Oct 19, 2023 | Graph Neural NetworkSpatial Reasoning | CodeCode Available | 0 |
| Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | Oct 19, 2023 | MuJoCoPrompt Engineering | CodeCode Available | 1 |
| Evaluating Robustness of Visual Representations for Object Assembly Task Requiring Spatio-Geometrical Reasoning | Oct 15, 2023 | BenchmarkingSpatial Reasoning | —Unverified | 0 |
| Integrating Symbolic Reasoning into Neural Generative Models for Design Generation | Oct 13, 2023 | Spatial Reasoning | —Unverified | 0 |
| SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics | Oct 6, 2023 | ObjectObject Discovery | —Unverified | 0 |
| Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning | Oct 5, 2023 | NavigateSpatial Reasoning | CodeCode Available | 1 |
| Improved Baselines with Visual Instruction Tuning | Oct 5, 2023 | Factual Inconsistency Detection in Chart CaptioningImage Classification | CodeCode Available | 6 |
| Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving | Oct 3, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 |