| I0T: Embedding Standardization Method Towards Zero Modality Gap | Dec 18, 2024 | Contrastive LearningImage-text Retrieval | CodeCode Available | 1 |
| FlexiViT: One Model for All Patch Sizes | Dec 15, 2022 | AllImage-text Retrieval | CodeCode Available | 1 |
| CoSMo: Content-Style Modulation for Image Retrieval With Text Feedback | Jun 19, 2021 | Image RetrievalImage-text Retrieval | CodeCode Available | 1 |
| A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports | Sep 3, 2020 | Image-text RetrievalMedical Visual Question Answering | CodeCode Available | 1 |
| Image-text Retrieval via Preserving Main Semantics of Vision | Apr 20, 2023 | Cross-Modal RetrievalImage-text Retrieval | CodeCode Available | 1 |
| Large-Scale Adversarial Training for Vision-and-Language Representation Learning | Jun 11, 2020 | Image-text RetrievalQuestion Answering | CodeCode Available | 1 |
| LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval | Jan 1, 2023 | image-classificationImage Classification | CodeCode Available | 1 |
| MLLMs-Augmented Visual-Language Representation Learning | Nov 30, 2023 | Image-text RetrievalRepresentation Learning | CodeCode Available | 1 |
| VladVA: Discriminative Fine-tuning of LVLMs | Dec 5, 2024 | Image-text RetrievalRepresentation Learning | —Unverified | 0 |
| Direction-Oriented Visual-semantic Embedding Model for Remote Sensing Image-text Retrieval | Oct 12, 2023 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 |