| SlotGNN: Unsupervised Discovery of Multi-Object Representations and Visual Dynamics | Oct 6, 2023 | ObjectObject Discovery | —Unverified | 0 |
| Can Large Language Models be Good Path Planners? A Benchmark and Investigation on Spatial-temporal Reasoning | Oct 5, 2023 | NavigateSpatial Reasoning | CodeCode Available | 1 |
| Improved Baselines with Visual Instruction Tuning | Oct 5, 2023 | Factual Inconsistency Detection in Chart CaptioningImage Classification | CodeCode Available | 6 |
| Talk2BEV: Language-enhanced Bird's-eye View Maps for Autonomous Driving | Oct 3, 2023 | Autonomous DrivingDecision Making | CodeCode Available | 1 |
| SmartPlay: A Benchmark for LLMs as Intelligent Agents | Oct 2, 2023 | MinecraftSpatial Reasoning | CodeCode Available | 1 |
| An Evaluation of ChatGPT-4's Qualitative Spatial Reasoning Capabilities in RCC-8 | Sep 27, 2023 | Spatial Reasoning | —Unverified | 0 |
| Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation | Sep 20, 2023 | 3D Scene ReconstructionDepth Estimation | CodeCode Available | 0 |
| Multi-camera Bird's Eye View Perception for Autonomous Driving | Sep 16, 2023 | Autonomous DrivingSensor Fusion | —Unverified | 0 |
| STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning | Sep 13, 2023 | RelationRelationship Detection | CodeCode Available | 0 |
| DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions | Sep 7, 2023 | PositionSpatial Reasoning | CodeCode Available | 1 |