| Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese | Nov 2, 2022 | Contrastive Learningimage-classification | CodeCode Available | 5 |
| AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities | Nov 12, 2022 | Contrastive LearningCross-Modal Retrieval | CodeCode Available | 4 |
| Cross-lingual and Multilingual CLIP | Jun 1, 2022 | Contrastive LearningImage-text Retrieval | CodeCode Available | 2 |
| InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | Dec 21, 2023 | Image RetrievalImage-to-Text Retrieval | CodeCode Available | 1 |
| General Image Descriptors for Open World Image Retrieval using ViT CLIP | Oct 20, 2022 | Image RetrievalRetrieval | CodeCode Available | 1 |
| Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval | Sep 28, 2023 | AttributeImage Retrieval | CodeCode Available | 1 |
| FETA: Towards Specializing Foundation Models for Expert Task Applications | Sep 8, 2022 | Domain GeneralizationFew-Shot Learning | CodeCode Available | 1 |
| FACTUAL: A Benchmark for Faithful and Consistent Textual Scene Graph Parsing | May 27, 2023 | Graph SimilarityHuman Judgment Correlation | CodeCode Available | 1 |
| FLAVA: A Foundational Language And Vision Alignment Model | Dec 8, 2021 | Image RetrievalImage-to-Text Retrieval | CodeCode Available | 1 |
| Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval | Feb 6, 2023 | AttributeComposed Image Retrieval (CoIR) | CodeCode Available | 1 |