| ViLEM: Visual-Language Error Modeling for Image-Text Retrieval | Jan 1, 2023 | Contrastive LearningImage-text Retrieval | —Unverified | 0 | 0 |
| VL-BEiT: Generative Vision-Language Pretraining | Jun 2, 2022 | image-classificationImage Classification | —Unverified | 0 | 0 |
| VLMAE: Vision-Language Masked Autoencoder | Aug 19, 2022 | Image-text RetrievalLanguage Modeling | —Unverified | 0 | 0 |
| VL-Match: Enhancing Vision-Language Pretraining with Token-Level and Instance-Level Matching | Jan 1, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 | 0 |
| Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval | Aug 23, 2018 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| Webly Supervised Joint Embedding for Cross-Modal lmage-Text Retrieval | Oct 1, 2018 | Cross-Modal RetrievalImage-text Retrieval | —Unverified | 0 | 0 |
| XGPT: Cross-modal Generative Pre-Training for Image Captioning | Mar 3, 2020 | Data AugmentationDenoising | —Unverified | 0 | 0 |
| Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation | Aug 2, 2024 | Image-text RetrievalRetrieval | —Unverified | 0 | 0 |