| RWKV-CLIP: A Robust Vision-Language Representation Learner | Jun 11, 2024 | Image-text RetrievalRepresentation Learning | CodeCode Available | 2 | 5 |
| MedCLIP: Contrastive Learning from Unpaired Medical Images and Text | Oct 18, 2022 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 | 5 |
| Frozen Transformers in Language Models Are Effective Visual Encoder Layers | Oct 19, 2023 | Action RecognitionImage-text Retrieval | CodeCode Available | 2 | 5 |
| PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents | Mar 13, 2023 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature | Jan 13, 2025 | ArticlesImage-text Retrieval | CodeCode Available | 2 | 5 |
| VeCLIP: Improving CLIP Training via Visual-enriched Captions | Oct 11, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 2 | 5 |
| DreamLIP: Language-Image Pre-training with Long Captions | Mar 25, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 | 5 |
| Cross-lingual and Multilingual CLIP | Jun 1, 2022 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 | 5 |
| Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval | Mar 8, 2024 | Image-text RetrievalRetrieval | CodeCode Available | 2 | 5 |
| FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation | Jun 10, 2025 | Image-text RetrievalQuestion Answering | CodeCode Available | 2 | 5 |