| SeLIP: Similarity Enhanced Contrastive Language Image Pretraining for Multi-modal Head MRI | Mar 25, 2025 | Contrastive LearningImage Segmentation | —Unverified | 0 |
| SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | Feb 20, 2025 | FairnessImage-text Retrieval | —Unverified | 0 |
| Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval | May 16, 2021 | Graph GenerationImage Captioning | —Unverified | 0 |
| SwAMP: Swapped Assignment of Multi-Modal Pairs for Cross-Modal Retrieval | Nov 10, 2021 | Contrastive LearningCross-Modal Retrieval | —Unverified | 0 |
| Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input | Jun 25, 2023 | DiversityImage-text Retrieval | —Unverified | 0 |
| SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Jan 4, 2024 | Image Captioningimage-classification | —Unverified | 0 |
| The style transformer with common knowledge optimization for image-text retrieval | Mar 1, 2023 | Image-text RetrievalRetrieval | —Unverified | 0 |
| TSVC:Tripartite Learning with Semantic Variation Consistency for Robust Image-Text Retrieval | Jan 19, 2025 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |
| UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training | Apr 1, 2021 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| UFO: A UniFied TransfOrmer for Vision-Language Representation Learning | Nov 19, 2021 | Image CaptioningImage-text matching | —Unverified | 0 |