| DiffCLIP: Differential Attention Meets CLIP | Mar 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Nov 15, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 | 5 |
| Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification | Sep 1, 2024 | Scene ClassificationTransductive Zero-Shot Classification | CodeCode Available | 2 | 5 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 | 5 |
| Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | May 16, 2025 | Cross-Modal RetrievalDiagnostic | CodeCode Available | 2 | 5 |
| CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Apr 30, 2024 | MambaState Space Models | CodeCode Available | 2 | 5 |
| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 | 5 |
| Boosting Vision-Language Models for Histopathology Classification: Predict all at once | Sep 3, 2024 | Allzero-shot-classification | CodeCode Available | 2 | 5 |
| CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification | Feb 27, 2024 | ClassificationDiagnostic | CodeCode Available | 2 | 5 |
| RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Jun 19, 2023 | ClassificationCross-Modal Retrieval | CodeCode Available | 2 | 5 |