| On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | Nov 9, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| An Empirical Study of Remote Sensing Pretraining | Apr 6, 2022 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 2 |
| Omnivore: A Single Model for Many Visual Modalities | Jan 20, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | May 16, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations | Dec 11, 2023 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | Oct 27, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| NarrativeXL: A Large-scale Dataset For Long-Term Memory Models | May 23, 2023 | Multiple-choiceReading Comprehension | CodeCode Available | 1 |
| CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets | Feb 13, 2023 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research | Nov 15, 2022 | Continual LearningDiversity | CodeCode Available | 1 |
| MovieCLIP: Visual Scene Recognition in Movies | Oct 20, 2022 | Genre classificationScene Recognition | CodeCode Available | 1 |