| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 |
| VeCLIP: Improving CLIP Training via Visual-enriched Captions | Oct 11, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 2 |
| Enhancing Remote Sensing Vision-Language Models for Zero-Shot Scene Classification | Sep 1, 2024 | Scene ClassificationTransductive Zero-Shot Classification | CodeCode Available | 2 |
| CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation | Apr 30, 2024 | MambaState Space Models | CodeCode Available | 2 |
| Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner | May 16, 2025 | Cross-Modal RetrievalDiagnostic | CodeCode Available | 2 |
| CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation | Nov 15, 2024 | Open Vocabulary Semantic SegmentationOpen-Vocabulary Semantic Segmentation | CodeCode Available | 2 |
| Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning | May 31, 2023 | Decision MakingGeneral Knowledge | CodeCode Available | 2 |
| DiffCLIP: Differential Attention Meets CLIP | Mar 9, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| GeoVision Labeler: Zero-Shot Geospatial Classification with Vision and Language Models | May 30, 2025 | ClassificationDisaster Response | CodeCode Available | 2 |
| RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Jun 19, 2023 | ClassificationCross-Modal Retrieval | CodeCode Available | 2 |