| On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | Nov 9, 2023 | Autonomous DrivingCommon Sense Reasoning | CodeCode Available | 2 |
| An Empirical Study of Remote Sensing Pretraining | Apr 6, 2022 | Aerial Scene ClassificationBuilding change detection for remote sensing images | CodeCode Available | 2 |
| Omnivore: A Single Model for Many Visual Modalities | Jan 20, 2022 | Action ClassificationAction Recognition | CodeCode Available | 2 |
| PIR: Remote Sensing Image-Text Retrieval with Prior Instruction Representation Learning | May 16, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| NuScenes-MQA: Integrated Evaluation of Captions and QA for Autonomous Driving Datasets using Markup Annotations | Dec 11, 2023 | Autonomous DrivingDescriptive | CodeCode Available | 1 |
| A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | Oct 27, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| NarrativeXL: A Large-scale Dataset For Long-Term Memory Models | May 23, 2023 | Multiple-choiceReading Comprehension | CodeCode Available | 1 |
| CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets | Feb 13, 2023 | Contrastive LearningRepresentation Learning | CodeCode Available | 1 |
| NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research | Nov 15, 2022 | Continual LearningDiversity | CodeCode Available | 1 |
| MovieCLIP: Visual Scene Recognition in Movies | Oct 20, 2022 | Genre classificationScene Recognition | CodeCode Available | 1 |
| Where in the World is this Image? Transformer-based Geo-localization in the Wild | Apr 29, 2022 | Diversitygeo-localization | CodeCode Available | 1 |
| Object-to-Scene: Learning to Transfer Object Knowledge to Indoor Scene Recognition | Aug 1, 2021 | ObjectScene Recognition | CodeCode Available | 1 |
| BORM: Bayesian Object Relation Model for Indoor Scene Recognition | Aug 1, 2021 | ObjectRelation | CodeCode Available | 1 |
| MultiScene: A Large-scale Dataset and Benchmark for Multi-scene Recognition in Single Aerial Images | Apr 7, 2021 | Learning with noisy labelsScene Recognition | CodeCode Available | 1 |
| Bidirectional Projection Network for Cross Dimension Scene Understanding | Mar 26, 2021 | 2D Semantic Segmentation3D Semantic Segmentation | CodeCode Available | 1 |
| A Study of Face Obfuscation in ImageNet | Mar 10, 2021 | AttributeObject | CodeCode Available | 1 |
| Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-Grained Recognition | Jan 1, 2021 | Material RecognitionScene Recognition | CodeCode Available | 1 |
| Self-supervised Video Representation Learning by Uncovering Spatio-temporal Statistics | Aug 31, 2020 | Action RecognitionRepresentation Learning | CodeCode Available | 1 |
| Visual Memorability for Robotic Interestingness via Unsupervised Online Learning | May 18, 2020 | Decision MakingIncremental Learning | CodeCode Available | 1 |
| Cross-Task Transfer for Geotagged Audiovisual Aerial Scene Recognition | May 18, 2020 | Scene Recognition | CodeCode Available | 1 |
| When CNNs Meet Random RNNs: Towards Multi-Level Analysis for RGB-D Object and Scene Recognition | Apr 26, 2020 | Object RecognitionScene Recognition | CodeCode Available | 1 |
| Unsupervised Model Personalization while Preserving Privacy and Scalability: An Open Problem | Mar 30, 2020 | Continual LearningDomain Adaptation | CodeCode Available | 1 |
| Indoor Scene Recognition in 3D | Feb 28, 2020 | 3D geometryMulti-Task Learning | CodeCode Available | 1 |
| Deep Attentional Structured Representation Learning for Visual Recognition | May 14, 2018 | Representation LearningScene Recognition | CodeCode Available | 1 |
| Open-Vocabulary Semantic Segmentation with Uncertainty Alignment for Robotic Scene Understanding in Indoor Building Environments | Mar 29, 2025 | NavigateOpen Vocabulary Semantic Segmentation | —Unverified | 0 |
| Lightweight Multimodal Artificial Intelligence Framework for Maritime Multi-Scene Recognition | Mar 10, 2025 | Disaster ResponseLarge Language Model | —Unverified | 0 |
| Contrastive Visual Data Augmentation | Feb 24, 2025 | Data AugmentationNovel Concepts | —Unverified | 0 |
| Seeing with Partial Certainty: Conformal Prediction for Robotic Scene Recognition in Built Environments | Jan 9, 2025 | Conformal PredictionHallucination | —Unverified | 0 |
| Advancing ALS Applications with Large-Scale Pre-training: Dataset Development and Downstream Assessment | Jan 9, 2025 | Scene RecognitionSelf-Supervised Learning | CodeCode Available | 0 |
| Vision-Language Models for Autonomous Driving: CLIP-Based Dynamic Scene Understanding | Jan 9, 2025 | Autonomous DrivingIn-Context Learning | —Unverified | 0 |
| Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues | Jan 1, 2025 | Action RecognitionScene Recognition | CodeCode Available | 0 |
| SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining | Jan 1, 2025 | Scene Recognition | —Unverified | 0 |
| Movement Control of Smart Mosque's Domes using CSRNet and Fuzzy Logic Techniques | Oct 13, 2024 | Scene Recognition | —Unverified | 0 |
| A Retention-Centric Framework for Continual Learning with Guaranteed Model Developmental Safety | Oct 4, 2024 | Autonomous DrivingContinual Learning | CodeCode Available | 0 |
| Rethinking VLMs and LLMs for Image Classification | Oct 3, 2024 | Classificationimage-classification | —Unverified | 0 |
| Less yet robust: crucial region selection for scene recognition | Sep 23, 2024 | Scene ClassificationScene Recognition | —Unverified | 0 |
| CARIn: Constraint-Aware and Responsive Inference on Heterogeneous Devices for Single- and Multi-DNN Workloads | Sep 2, 2024 | Scene Recognitiontext-classification | —Unverified | 0 |
| Indoor scene recognition from images under visual corruptions | Aug 23, 2024 | Scene Recognition | —Unverified | 0 |
| Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces | Aug 13, 2024 | AttributeLanguage Modeling | —Unverified | 0 |
| Multi-task Prompt Words Learning for Social Media Content Generation | Jul 10, 2024 | Keyword ExtractionScene Recognition | —Unverified | 0 |
| Advancing Ubiquitous Wireless Connectivity through Channel Twinning | Jun 18, 2024 | Scene Recognition | —Unverified | 0 |
| Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging | Apr 3, 2024 | AttributeDimensionality Reduction | —Unverified | 0 |
| TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model | Mar 15, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions | Mar 12, 2024 | Autonomous DrivingDecoder | —Unverified | 0 |
| Leveraging Self-Supervised Learning for Scene Classification in Child Sexual Abuse Imagery | Mar 2, 2024 | Scene ClassificationScene Recognition | —Unverified | 0 |
| Digital Divides in Scene Recognition: Uncovering Socioeconomic Biases in Deep Learning Systems | Jan 23, 2024 | Scene ClassificationScene Recognition | —Unverified | 0 |
| Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition | Jan 9, 2024 | Representation LearningScene Recognition | —Unverified | 0 |
| Inter-object Discriminative Graph Modeling for Indoor Scene Recognition | Nov 10, 2023 | ObjectScene Recognition | —Unverified | 0 |
| Counting Manatee Aggregations using Deep Neural Networks and Anisotropic Gaussian Kernel | Nov 4, 2023 | Crowd CountingScene Recognition | CodeCode Available | 0 |
| A Multi-Modal Foundation Model to Assist People with Blindness and Low Vision in Environmental Interaction | Oct 31, 2023 | Language ModelingLanguage Modelling | —Unverified | 0 |