| An Empirical Study of Remote Sensing Pretraining | Apr 6, 2022 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 2 |
| Omnivore: A Single Model for Many Visual Modalities | Jan 20, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | Nov 9, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| Where in the World is this Image? Transformer-based Geo-localization in the Wild | Apr 29, 2022 | Diversitygeo-localization | CodeCode Available | 1 |
| Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition | Jan 1, 2021 | Material RecognitionScene Recognition | CodeCode Available | 1 |
| When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition | Apr 26, 2020 | Object RecognitionScene Recognition | CodeCode Available | 1 |
| NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations | Dec 11, 2023 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition | May 18, 2020 | Scene Recognition | CodeCode Available | 1 |
| MultiScene: A Large-scale Dataset and Benchmark for Multi-scene Recognition in Single Aerial Images | Apr 7, 2021 | Learning with noisy labelsScene Recognition | CodeCode Available | 1 |
| NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research | Nov 15, 2022 | Continual LearningDiversity | CodeCode Available | 1 |
| PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | May 16, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics | Aug 31, 2020 | Action RecognitionRepresentation Learning | CodeCode Available | 1 |
| Unsupervised Model Personalization while Preserving Privacy and Scalability: An Open Problem | Mar 30, 2020 | Continual LearningDomain Adaptation | CodeCode Available | 1 |
| Visual Memorability for Robotic Interestingness via Unsupervised Online Learning | May 18, 2020 | Decision MakingIncremental Learning | CodeCode Available | 1 |
| Bidirectional Projection Network for Cross Dimension Scene Understanding | Mar 26, 2021 | 2D Semantic Segmentation3D Semantic Segmentation | CodeCode Available | 1 |
| A Study of Face Obfuscation in ImageNet | Mar 10, 2021 | AttributeObject | CodeCode Available | 1 |
| CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets | Feb 13, 2023 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| BORM: Bayesian Object Relation Model for Indoor Scene Recognition | Aug 1, 2021 | ObjectRelation | CodeCode Available | 1 |
| Indoor Scene Recognition in 3D | Feb 28, 2020 | 3D geometryMulti-Task Learning | CodeCode Available | 1 |
| MovieCLIP: Visual Scene Recognition in Movies | Oct 20, 2022 | Genre classificationScene Recognition | CodeCode Available | 1 |
| NarrativeXL: A Large-scale Dataset For Long-Term Memory Models | May 23, 2023 | Multiple-choiceReading Comprehension | CodeCode Available | 1 |
| A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | Oct 27, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Deep Attentional Structured Representation Learning for Visual Recognition | May 14, 2018 | Representation LearningScene Recognition | CodeCode Available | 1 |
| Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene Recognition | Aug 1, 2021 | ObjectScene Recognition | CodeCode Available | 1 |
| A Bag of Visual Words Approach for Symbols-Based Coarse-Grained Ancient Coin Classification | Apr 23, 2013 | General ClassificationScene Recognition | —Unverified | 0 |