| Continual learning in cross-modal retrieval | Apr 14, 2021 | Continual Learningcross-modal alignment | —Unverified | 0 |
| Contrastive Feature Masking Open-Vocabulary Vision Transformer | Sep 2, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging | Jul 10, 2024 | Contrastive LearningImage-text Retrieval | —Unverified | 0 |
| COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval | Apr 15, 2022 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| CPL: Counterfactual Prompt Learning for Vision and Language Models | Oct 19, 2022 | counterfactualimage-classification | —Unverified | 0 |
| Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset | May 25, 2022 | Image CaptioningImage Retrieval | —Unverified | 0 |
| A New Fine-grained Alignment Method for Image-text Matching | Nov 3, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning | Oct 15, 2024 | Image-text RetrievalText Retrieval | —Unverified | 0 |
| DCFormer: Efficient 3D Vision-Language Modeling with Decomposed Convolutions | Feb 7, 2025 | Anomaly DetectionImage-text Retrieval | —Unverified | 0 |
| Deep Semantic Multimodal Hashing Network for Scalable Image-Text and Video-Text Retrievals | Jan 9, 2019 | Cross-Modal RetrievalDeep Hashing | —Unverified | 0 |