| An Empirical Study of Remote Sensing Pretraining | Apr 6, 2022 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 2 |
| Omnivore: A Single Model for Many Visual Modalities | Jan 20, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | Nov 9, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| Indoor Scene Recognition in 3D | Feb 28, 2020 | 3D geometryMulti-Task Learning | CodeCode Available | 1 |
| Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition | May 18, 2020 | Scene Recognition | CodeCode Available | 1 |
| Deep Attentional Structured Representation Learning for Visual Recognition | May 14, 2018 | Representation LearningScene Recognition | CodeCode Available | 1 |
| BORM: Bayesian Object Relation Model for Indoor Scene Recognition | Aug 1, 2021 | ObjectRelation | CodeCode Available | 1 |
| A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | Oct 27, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Bidirectional Projection Network for Cross Dimension Scene Understanding | Mar 26, 2021 | 2D Semantic Segmentation3D Semantic Segmentation | CodeCode Available | 1 |
| A Study of Face Obfuscation in ImageNet | Mar 10, 2021 | AttributeObject | CodeCode Available | 1 |