| Knowledge Transfer Across Modalities with Natural Language Supervision | Nov 23, 2024 | Image-text RetrievalNovel Concepts | —Unverified | 0 | 0 |
| Learning Multi-Modal Nonlinear Embeddings: Performance Bounds and an Algorithm | Jun 3, 2020 | cross-modal alignmentGeneral Classification | —Unverified | 0 | 0 |
| Learning to embed semantic similarity for joint image-text retrieval | Oct 7, 2022 | Image-text RetrievalMetric Learning | —Unverified | 0 | 0 |
| Multimodal Adversarial Defense for Vision-Language Models by Leveraging One-To-Many Relationships | May 29, 2024 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 | 0 |
| LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models | Dec 1, 2023 | image-classificationImage Classification | —Unverified | 0 | 0 |
| LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning | Mar 4, 2025 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models | Nov 17, 2017 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval | Mar 10, 2022 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |
| LuoJiaHOG: A Hierarchy Oriented Geo-aware Image Caption Dataset for Remote Sensing Image-Text Retrival | Mar 16, 2024 | Caption GenerationImage-text Retrieval | —Unverified | 0 | 0 |
| MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning | Oct 9, 2022 | Image-text Retrievalmultimodal interaction | —Unverified | 0 | 0 |