| ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training | Sep 30, 2022 | Computational EfficiencyContrastive Learning | CodeCode Available | 0 | 5 |
| Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset | May 25, 2022 | Image CaptioningImage Retrieval | —Unverified | 0 | 0 |
| CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance | Dec 5, 2024 | Contrastive Learningcross-modal alignment | —Unverified | 0 | 0 |
| CAVL: Learning Contrastive and Adaptive Representations of Vision and Language | Apr 10, 2023 | Image RetrievalPhrase Grounding | —Unverified | 0 | 0 |
| An analysis of vision-language models for fabric retrieval | Jul 7, 2025 | AttributeCross-Modal Retrieval | —Unverified | 0 | 0 |