| RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Jun 19, 2023 | ClassificationCross-Modal Retrieval | CodeCode Available | 2 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 |
| RWKV-CLIP: A Robust Vision-Language Representation Learner | Jun 11, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 2 |
| Your Diffusion Model is Secretly a Zero-Shot Classifier | Mar 28, 2023 | Domain GeneralizationFine-Grained Image Classification | CodeCode Available | 2 |
| Advancing Medical Representation Learning Through High-Quality Data | Mar 18, 2025 | Representation Learningzero-shot-classification | CodeCode Available | 1 |
| ArcheType: A Novel Framework for Open-Source Column Type Annotation using Large Language Models | Oct 27, 2023 | Column Type AnnotationTable annotation | CodeCode Available | 1 |
| Exploring the Spectrum of Visio-Linguistic Compositionality and Recognition | Jun 13, 2024 | Retrievalzero-shot-classification | CodeCode Available | 1 |
| Exploring Vision-Language Models for Imbalanced Learning | Apr 4, 2023 | Decoderzero-shot-classification | CodeCode Available | 1 |
| Florence: A New Foundation Model for Computer Vision | Nov 22, 2021 | Action ClassificationAction Recognition | CodeCode Available | 1 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 |
| DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection | Oct 2, 2023 | Novel Object DetectionObject | CodeCode Available | 1 |
| EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition | Oct 25, 2023 | Facial Expression RecognitionFacial Expression Recognition (FER) | CodeCode Available | 1 |
| Discovering Human Interactions With Novel Objects via Zero-Shot Learning | Jun 1, 2020 | Human-Object Interaction DetectionObject | CodeCode Available | 1 |
| Differentiable Model Scaling using Differentiable Topk | May 12, 2024 | GPUimage-classification | CodeCode Available | 1 |
| Discriminative Region-based Multi-Label Zero-Shot Learning | Aug 20, 2021 | Image RetrievalMulti-label zero-shot learning | CodeCode Available | 1 |
| From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection | May 19, 2025 | feature selectionOut-of-Distribution Generalization | CodeCode Available | 1 |
| CyCLIP: Cyclic Contrastive Language-Image Pretraining | May 28, 2022 | Representation LearningVisual Reasoning | CodeCode Available | 1 |
| Controlling Latent Diffusion Using Latent CLIP | Mar 11, 2025 | DenoisingDescriptive | CodeCode Available | 1 |
| Contrastive Language-Image Pre-training for the Italian Language | Aug 19, 2021 | Image RetrievalMulti-label zero-shot learning | CodeCode Available | 1 |
| DC3DO: Diffusion Classifier for 3D Objects | Aug 13, 2024 | 3D Object ClassificationClassification | CodeCode Available | 1 |
| CountCLIP -- [Re] Teaching CLIP to Count to Ten | Jun 5, 2024 | zero-shot-classificationZero-Shot Counting | CodeCode Available | 1 |
| CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation | Feb 27, 2025 | Image-text matchingObject | CodeCode Available | 1 |
| CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections | Nov 28, 2024 | image-classificationImage Classification | CodeCode Available | 1 |
| Open-Pose 3D Zero-Shot Learning: Benchmark and Challenges | Dec 12, 2023 | 3D Object ClassificationClassification | CodeCode Available | 1 |
| CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification | Feb 25, 2025 | Denoisingzero-shot-classification | CodeCode Available | 1 |