| CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation | Feb 27, 2025 | Image-text matchingObject | CodeCode Available | 1 | 5 |
| CyCLIP: Cyclic Contrastive Language-Image Pretraining | May 28, 2022 | Representation LearningVisual Reasoning | CodeCode Available | 1 | 5 |
| From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection | May 19, 2025 | feature selectionOut-of-Distribution Generalization | CodeCode Available | 1 | 5 |
| Efficient Medical Vision-Language Alignment Through Adapting Masked Vision Models | Jun 10, 2025 | Contrastive LearningImage-text matching | CodeCode Available | 1 | 5 |
| DST-Det: Simple Dynamic Self-Training for Open-Vocabulary Object Detection | Oct 2, 2023 | Novel Object DetectionObject | CodeCode Available | 1 | 5 |
| EmoCLIP: A Vision-Language Method for Zero-Shot Video Facial Expression Recognition | Oct 25, 2023 | Facial Expression RecognitionFacial Expression Recognition (FER) | CodeCode Available | 1 | 5 |
| Discovering Human Interactions With Novel Objects via Zero-Shot Learning | Jun 1, 2020 | Human-Object Interaction DetectionObject | CodeCode Available | 1 | 5 |
| CLIP-Guided Source-Free Object Detection in Aerial Images | Jan 10, 2024 | Domain AdaptationObject | CodeCode Available | 1 | 5 |
| CLIPArTT: Adaptation of CLIP to New Domains at Test Time | May 1, 2024 | Pseudo LabelTest-time Adaptation | CodeCode Available | 1 | 5 |
| Discriminative Region-based Multi-Label Zero-Shot Learning | Aug 20, 2021 | Image RetrievalMulti-label zero-shot learning | CodeCode Available | 1 | 5 |