| VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models | Sep 12, 2022 | AttributeImage-text Retrieval | CodeCode Available | 0 |
| MultiWay-Adapater: Adapting large-scale multi-modal models for scalable image-text retrieval | Sep 4, 2023 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text Retrieval | Dec 26, 2024 | Image-text RetrievalInformation Retrieval | CodeCode Available | 0 |
| Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark | Feb 14, 2022 | BenchmarkingContrastive Learning | CodeCode Available | 0 |
| Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning | Jan 30, 2024 | DiversityImage-text Retrieval | CodeCode Available | 0 |
| From Unimodal to Multimodal: Scaling up Projectors to Align Modalities | Sep 28, 2024 | Image-text RetrievalSemantic Similarity | CodeCode Available | 0 |
| Multi-stage Pre-training over Simplified Multimodal Pre-training Models | Jul 22, 2021 | Image-text RetrievalRetrieval | CodeCode Available | 0 |
| Dissecting Deep Metric Learning Losses for Image-Text Retrieval | Oct 21, 2022 | Cross-Modal RetrievalImage-text matching | CodeCode Available | 0 |
| Multilingual Vision-Language Pre-training for the Remote Sensing Domain | Oct 30, 2024 | Cross-Modal Retrievalimage-classification | CodeCode Available | 0 |
| FiCo-ITR: bridging fine-grained and coarse-grained image-text retrieval for comparative performance analysis | Jul 29, 2024 | Image-text RetrievalModel Selection | CodeCode Available | 0 |