| SLAN: Self-Locator Aided Network for Cross-Modal Understanding | Nov 28, 2022 | Image RetrievalImage to text | —Unverified | 0 |
| SLAN: Self-Locator Aided Network for Vision-Language Understanding | Jan 1, 2023 | Image RetrievalImage to text | —Unverified | 0 |
| SRCB at SemEval-2022 Task 5: Pretraining Based Image to Text Late Sequential Fusion System for Multimodal Misogynous Meme Identification | Jul 1, 2022 | Image to text | —Unverified | 0 |
| SurrogatePrompt: Bypassing the Safety Filter of Text-to-Image Models via Substitution | Sep 25, 2023 | Image to text | —Unverified | 0 |
| Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval | May 16, 2021 | Graph GenerationImage Captioning | —Unverified | 0 |
| SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment | Jan 4, 2024 | Image Captioningimage-classification | —Unverified | 0 |
| Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image | Oct 20, 2024 | Image to text | —Unverified | 0 |
| Synthesizing Novel Pairs of Image and Text | Dec 18, 2017 | Image to text | —Unverified | 0 |
| Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models | Mar 30, 2023 | Image to textPrompt Learning | —Unverified | 0 |
| TMCIR: Token Merge Benefits Composed Image Retrieval | Apr 15, 2025 | Contrastive Learningcross-modal alignment | —Unverified | 0 |